General Disclaimer 


One or more of the Following Statements may affect this Document 


• This document has been reproduced from the best copy furnished by the 
organizational source. It is being released in the interest of making available as 
much information as possible. 


• This document may contain data, which exceeds the sheet parameters. It was 
furnished in this condition by the organizational source and is the best copy 
available. 


• This document may contain tone-on-tone or color graphs, charts and/or pictures, 
which have been reproduced in black and white. 


• This document is paginated as submitted by the original source. 


• Portions of this document are not fully legible due to the historical nature of some 
of the material. However, it is the best reproduction available from the original 
submission. 


Produced by the NASA Center for Aerospace Information (CASI) 



SOFTWARE ENGINEERING LABORATORY SEL -82-007 


SEL-82-007 



PROCEEDINGS OF THE 
SEVENTH ANNUAL SOFTWARE 


NdJ- J2J56 
TdHU 

UiiCidS 
<^847 fa 


DECEMBER 1982 


ENGINEERING WORKSHOP 


(*AGA-n'l-d340(jJ L-iiL,C thumbs Gir IHfc Jj c. k *i u A « 

A8NUAL oUl-rWAhii ii NGiN Nu aCiihGHQt 

3^4 p uC An/rt* A 0 i cjc* j'Jli 

GJ/6 I 


NASA 

National Aeronautics and 
Soace Administration 

Goddard Space Flight Center 

Greenbelt, Maryland 20771 


PROCEEDINGS 


OF 

SEVENTH ANNUAL SOFTWARE ENGINEERING WORKSHOP 


Organized by . 

Software Engineering Laboratory 
GSFC 


December 1 , 1982 


GODDARD SPACE FLIGHT CENTER 
Greenbelt, Maryland 


FOREWORD 


The Software Engineering Laboratory (SEL) is an organization 
sponsored by the National Aeronautics and Space Administra- 
tion Goddard Space Flight Center (NASA/GSFC) and created for 
the purpose of investigating the effectiveness of software 
engineering technologies when applied to the development of 
applications software. The SEL was created in 1977 and has 
three primary organizational members: 

NASA/GSFC (Systems Development and Analysis Branch) 

The University of Maryland (Computer Sciences Department) 
Computer Sciences Corporation (Flight Systems Operation) 

The goals of the SEL are (1) to understand the software de- 
velopment process in the GSFC environment? (2) to measure 
the effect of various methodologies, tools, and models on 
this process? and (2) to identify and th o apply success- 
ful development practices. The activities, findings, and 
recommendations of the SEL are recorded in the Software En- 
gineering Laboratory Series, a continuing series of reports 
that includes this document. A version of this document was 
also issued as NASA/GSFC document in 1982. 

Single copies of this document can be obtained by writing to 

Frank E. McGarry 
Code 582.1 
NASA/GSFC 

Greenbelt, Maryland 20771 


PRECEDING PAGE BLANK NOT FILMED' 



SEVENTH ANNUAL SOFTWARE ENGINEERING WORKSHOP 


ABOUT THE WORKSHOP 


The Seventh Annual Software Engineering Workshop was held on IX'cemher 1. 1982, ai Goddard 
Spaee Flight ('enter in Greenbelt, MI). Nearly 250 people, representing 9 universities, 22 agencies 
of the federal government, and 43 private organizations, attended the meeting. 

As in the past 6 years, the major emphasis for this meeting was the reporting and discussion of 
experiences in the identification, utilization, and evaluation of software methodologies, models, 
and tools. Twelve speakers, making up four separate sessions, participated in the meeting with 
each session having a panel format with heavy participation from the audience. 

The workshop is organized by the Software Engineering Laboratory (SEL). whose members repre- 
sent the NASA/GSEC. University of Maryland, and Computer Sciences Corporation (CSC). The 
meeting has been an annual event for the past 7 years (1976 to 1982). and there are plans to con- 
tinue those yearly meetings as long as they are productive. 

The record of the meeting is generated by members of the SEL and is printed and distributed by 
the Goddard Space Flight ('enter. All persons who are registered on the mail list of the SEL 
receive copies of the proceedings at no charge. 

Additional information about the workshop or about the SEL may be obtained by contacting: 


Mr. Frank McGarry 
('ode 582.1 
NASA/c, SEC 
Greenbelt. MI) 20771 

301-344-5048 


f AGF PLAr.'X 


.\rrrs 


v 



AGENDA 


SEVENTH ANNUAL SOFTWARE ENGINEERING WORKSHOP 
NASA/GODDARD SPACE FLIGHT CENTER 
BUILDING 3 AUDITORIUM 
DECEMBER I. 1982 


8:00 a m. 

Registration “Sign-In” 
Coffee-Donuts 


8 30 a. in. 

INTRODUCTORY REMARKS 
"What Have We Learned in 6 Years?” 

F. E. McGarry (NASA/GSFC) 

9:00 a. in. 

SESSION NO. 1 

TOPIC: The Software Engineering 
Laboratory (SEL) 



Discussant: J. Page (CSC) 


"Software Errors and Complexity, 
An Empirical Investigation" 

V. Basili (University of MD) 


"When and How to Use a Software 
Reliability Model" 

A. Goel (Syracuse University) 


"Measuring the Application of 
Software Prototypes” 

M. Zelkowitz (University of MD) 

10:30 a. in. 

BREAK 


11:00 a.m. 

SESSION NO. 2 

TOPIC: Software Tools 



Discussant: P. Scheffer 

(Martin Marietta) 


"Experience and Perspectives 
with SRI’s Tools for Software 
Design and Validation” 

J. Goguen (SRI) 

K. Levitt (SRI) 


"Technology Transfer Software 
Engineering Tools” 

I. Miyamoto (University of MD) 


“Design Aids for Real-Time Systems” 

P. Szulewski (Draper Labs) 

12:30 p.m. 

LUNCH 

PRECEDING PACE BLANK 

i\ ‘J l' f ‘ ^ 


vii 



1 :30 p in. 


SESSION NO. 3 


TOPIC Soi l ware Errors 


3:00 p.m. 
3:30 p.m. 


"Software Error Data Collection 
ami Categorization” 

“An Effective Bug Classification 
Scheme Must Take the Programmer 
Into Account” 

“Software Anomoly Taxonomy 
What Can he Gained?” 

BREAK 

SESSION NO. 4 


Discussant: I). Simkins (IBM) 

T. Ostraiul (Sperry Univac) 

I'. Weyuker (Courant Inst.) 

E. Sol lo way (Yale) 

W. Johnson (Yale) 

S. Draper (University of CA) 

I). Bucklami (P.eifer Consultants) 

TOPIC: Cost Estimation 
Discussant: D. Card (CSC) 


‘ Maintenance Estimation 

Methodology" K. Rone DBM) 

“Staffing Implications of Software 

Productivity Models" R. Tausworthe (JPL) 

‘‘Estimates of Software Size From R. Brithcher (IBM) 

State Machine Designs” J. Gaffney (IBM) 


5 :00 p.m. 


ADJOURN 



SUMMARY OF THE SESSIONS: SEVENTH ANNUAL SOFTWARE 

ENGINEERING WORKSHOP 


Michael Rohleder 

COMPUTER SCIENCES CORPORATION 
and 

THE GODDARD SPACE FLIGHT CENTER 
SOFTWARE ENGINEERING LABORATORY 

Prepared for the 
NASA/GSFC 

Seventh Annual Software Engineering Workshop 
December 1982 


INTRODUCTORY REMARKS 


Frank McGarry - "What Have We Learned in Six Years?" 

Frank McGarry of the Goddard Space Flight Center (GSFC) 
opened the workshop with a summary of results obtained from 
the analysis of data collected by the Software Engineering 
Laboratory (SEL) . The SEL has monitored 46 software de- 
velopment projects at GSFC during the past 6 years. The 
discussion covered the areas of profiles, models, and meth- 
odologies. Within these areas, a number of results were 
presented . 

The use of modern programming practices (MPP) favorably af- 
fects productivity and reliability. A 15-percent increase 
in productivity was demonstrated. However, the effect of 
MPP on reliability was found to be highly variable. Pro- 
grammer ability and experience was shown to have the 
greatest influence on the productivity of the software de- 
velopment process. Studies of reliability and cost models 
were inconclusive. More theoretical development of and 
practical experience with such models is needed before they 
can be applied effectively in a production environment. 

The costs of data collection were identified and quan- 
tified. These include task overhead, data processing, and 
aata analysis. Data collection is expensive, but it is es- 
sential to understanding and improving the software develop- 
ment process. 

In response to questions and comments from the audience, 
McGarry clarified several points: 

• A number of methodologies have proved to be cost 
effective in the GSFC environment. However, 
numerical values for the benefits and costs of 


M. Rohlcder 
CSC 
1 of 1 7 



individual methodologies are difficult to deter- 
mine. The maximum savings observed were about 15 
to 20 percent for a combination of MPP. 

• Except for errors, data from the maintenance phase 
was not included in these analyses. 


M. RohleJer 
CSC 
2 of 17 


SESSION 1 - THE SOFTWARE ENGINEERING LABORATORY 

Victor Basili --"Sof tware Furors and Complexity, An Empirical 
Investigat. > on" 

The first speaker of the first session was Victor Basili of 
the University of Maryland. This presentation focused on 
the distributions and relationships derived from error data 
collected during the development of a medium-scale software 
project. The error characteristics of this project were 
shown to reflect significant differences between this proj- 
ect and the class of projects usually studied by the SEL. 

Modified and new modules were shown to differ in the types 
of errors prevalent in each and the amount of effort re- 
quired to correct an error. Modified modules appeared to be 
more susceptible to errors due to the misunderstanding of 
specifications. One surprising result presented by Basili 
was that an increase in module size did not increase error 
proneness. In fact, larger modules were shown to be less 
error prone. This was true even though the larger modules 
were more complex. A number of explanations for this phe- 
nomenon were suggested. 

In response to questions and comments from the audience, 
Basili clarified the following points: 

• Errors of commission were those errors caused by an 
incorrect program statement. Errors of omission 
were those errors that resulted from forgetting to 
include a statement or parameter . 

• A large portion of the errors was attributed to a 
misunderstanding of specifications or requirements. 

• The effect of programmer experience was considered 
in the investigation. 


M. Rohleder 
CSC 
3 of 17 



• Additional work is required to determine the op- 
timum size of modules with respect to reliability. 

• Errors caused by earlier error correction efforts 
were found to be, at most, 6 percent of the total. 

• Data was not available on the time required to cor- 
rect errors in large versus small modules. 


M. Rohlodei 
CSC 
4 of I 7 



Amnt Goe l-- Wh>;n and How To Use a Software Reliability 
Model" 

The second speaker of the session was Amrit Goel from 
Syracuse University (on leave to the University of 
Maryland) . This presentation dealt with the role of soft- 
ware errors in determining the reliability of large-scale, 
computer-based systems. The use of stochastic and combina- 
torial models to assess system reliability in the presence 
of failures caused by software errors was examined. It was 
suggested that users were employing models that were readily 
available on their computer systems ratner than the most 
appropriate model for their development environments. This 
is due to incorrect or ambiguous interpretations of model 
assumptions and output. 

Goel presented views about the utility of the available 
models during various stages of the development process and 
in different testing situations. Alternatives to reliabil- 
ity models were also suggested for occasions when the cur- 
rently available models do not seem to be applicable. 

The following points were made by the audience in response 
to the presentation: 

• Rick Gale pointed out that software testing should 
be driven by reliability model measures. 

e John Musa agreed that appropriate testing is neces- 
sary to obtain valid results from a model. 


M. Rohleder 
CSC 
5 of 17 



Marvin Zelkowitz - -"Measuring the Application of Software 

Prototypal" 

The last speaker of the first session was Marvin Zelkowitz 
of the University of Maryland. This presentation covered 
the development and application of prototypes for software 
systems. The differences between models and prototypes were 
identified as well as essential elements common to both. 
Environmt .al considerations and their influences on proto- 
type development were also discussed. 

An ongoing experiment in prototyping, the Flight Dynamics 
Attitude Simulator (FDAS) , was described. A number of fac- 
tors motivated the choice of the prototyping approach for 
the development of this system. These include uncertainties 
about size, requirements, and interfaces. 

In response to questions and comments from the audience, 
Zelkowitz clarified the following points: 

• The major goal in the development of this prototype 
is to examine project requirements and feasibility 
more closely. Specifications for the full system 
will be based on the results of the prototyping 
experience. 

• The need for prototype development stems from the 
fact that FDAS is a very different type of system 
from those usually developed in this environment. 

• Prototypes are not built merely to "tack on" ad- 
ditional features at a later date to build the full 
system. Some elements may migrate to the full 
system, however. 

• Elaine Weyuker disagreed with the 10-percent esti- 
mate for the cost of a prototype versus full imple- 
mentation and suggested that 30 percent is more 
realistic in a nonacademic environment. 


M. Rohleder 
CSC 
6 of 17 



SESSION 2 - SOFTWARE TOOLS 

Karl Levitt and Joseph Goguen -- M Exper lence and Perspectives 

With SRT s Tools for Software 
Design and Validation" 

The initial speakers of the second session were Karl Levitt 
and Joseph Goguen from SRI International. The joint presen- 
tation described current approaches to software tools for 
design specification and presented experiences with several 
projects at SRI. 

Four development tools were introduced: the STP theorem 

prover and its associated Design Verification System; PHIL, 
a meta-programmable , context-sensitive structured editor; 
Pegasus, a system for supporting graphics programming; and 
OBJ, an ultra-high-level programming language based on 
rewrite rules and abstract data types. 

The speakers described successful efforts to apply these 
tools to design specification and verification for two 
classes of systems in which reliability is vital: fault- 

tolerant systems for aircraft control and secure operating 
systems - 

In response to questions and comments from the audience, the 
following points were clarified: 

• A major purpose of a specification language is to 
support the decomposition and testing of designs at 
an early stage. 

• The most compelling reason for the lack of formal 
specifications languages with tool support is the 
absence of examples that model good specifications 
having the right amount of detail. 


M. Rohleder 
CSC 
7 of 17 



Isao Miyamoto - -"Technology Transfer Software Engineering 

Tools" 

The second speaker of the session was Isao Miyamoto from the 
University of Maryland, Baltimore County, who discussed 
technology transfer as it applies to software engineering 
tools. 

Experiences with tool usage and availability were pre- 
sented. Miyamoto identified three reasons that tools are 
not used: 

1. Lack of a clearly defined methodology 

2. Economic ineffectiveness 

3. Lack of measures and criteria for evaluating the 
effectiveness of tools 

An example was presented of a software maintenance support 
tool system called "Pandora's Box." This system provides 
users with a hierarchical network of menus designed to 
provide user-friendly capabilities from novice to expert. 

It is hoped that the project will produce a tool that will 
gain user acceptance. 

In response to a question from the audience, Miyamoto clar- 
ified the following point: designing easy-to-use, cost- 

effective tools is the key point in transferring software 
engineering technology from the research laboratory to users. 


M. Rohlcder 
CSC 
8 of 1 7 



Paul Szulewski --"Design Aids for Real-Time Systems" 


The last speaker of the session was Paul Szulew..ki of the 
Draper Laboratory. The presentation described ongoing ef- 
forts with Design Aids for Real-Time Systems (DARTS). This 
tool assists in defining embedded computer systems through 
tree-structured graphics, military standards documentation 
support, and various analyses including calculation of 
Halstead's Software Science measures. 

DARTS uses a mix of hierarchical organization, control con- 
ventions, communications primitives, and data structures to 
represent real-time systems. Requirements are expressed as 
a functional hierarchy, and the design is represented as a 
tree-structured hierarchy of communicating processes. 

Through a user-friendly, menu-oriented interface, a user can 
define a system; perform data flow checking; generate sim- 
ulations of response time, throughput, and utilization; 
request a variety of data tables and graphical tree- 
structured output in various sizes; and calculate Software 
Science measures. 

In response to questions and comments from the audience, 
Szulewski clarified the following points: 

• DARTS is operational on an Amdahl 470 V8. It con- 
sists of approximately 20,000 lines of PLl code. 

• DARTS has not been used thus far for applications 
such as PERT charting. 

• Tool availability and desirability from a user's 
standpoint are important aspects of tool design. 


M. Rohlcder 
CSC 
9 of 17 



SESSION \ - SOKTWAin: KKKOK:; 

Thoma .*« Ositiami- "So! t w.u o Ft tot Pa t .1 I’olhvt ion ,uul 

Pal root 1 /at ion" 

Tito tit sit sipoaks'i ot t ho t In t d siosis; ton was; Thomas; Ot; t t anil ot 
Spoil y Utuvao, who pts'sonts'd tin* ts'sults; ot it tos'.oatoh ptoj- 
oot lions* lointly with Plains* Woynkot . Tho ptojoot aualy/oii 
tho t o lat i>’tis;h ip ot 01 tot ohat aot »*t 1 sst 10s; to vai lotisi ar.poots. 
ot tho siottwati* ilovo 1 optnont piooos.r* by stmiyino nottwats* 
ottots; oommittod lint » no tho ilovo l opmont ot an int ot aot tvo, 
s;poo 1 a l -pm pose* i'il 1 1 o t s.y s.t om. A tu*w ct 1 01 oate*«»ot t /.at ton 
systom was; ilovs* lopoil ami 1 ’4 ottots; wots* o l ass; 1 1 1 e*d with 
this; su’homs*. 

Tho now 01 tot oats'oot 1 /at ion sohonio was; liovolopoii t 1 om pto- 
Oiamms'i dosto t t pt 1 i*ns; ot ottots;, t hs* 1 1 s;y nipt oms; , ami ootte'o- 
t tons;. Pom ootiotio at t t 1 but e*s; , ot d tmotis; ions; , ot sottwato 
ot tot s; ws'ts* liis'nt 1 1 is*ii ; s'aoh ot tot was; o 1 as;:; i t i mi by assion- 
i no it a va 1 us* tot oaoh ii 1 ms*ns; 1 on . Tho:;o ii i mono ions; anii 
t hs* i t posit; ibis* valuos; lotloot tho s;ps*o 1 1 to otiots; i ilont i t 1 s*ii 
ilut i no tin* ptojoot. Ths'su* ii 1 nit'iis; t ons; mo l mis* nia lot oat- 
oooty, typo, | tos;s'iu’s*, anii use*. 

In rosiponsis* to om*s;t .ions; anii oommonts; 1 1 om tho avniionoo, 
Os’.tranii olatitioil the* tollowino points: 

• Pood 1 appot t with the* ptoot ammots is; vital to suio- 
Os»s;s; m iiata oolls'otion t'ttoit;;. 

• Ps's'.ion was; dons* mtotmally. Plowohat t s< , totmal 
t oqu 1 1 oms'ii t s; , and s;ps*o 1 t io.it ions; wot o not usiod. 

• Tin* inipot t atii*s' ot ts'ls'vant tntotmation in data 00 1 - 
loot ion ottoits; oantiot be* ovo t ompha s 1 / s*d . 


M Kohli'iU'i 
(Si 

10 ot 1 • 



Klltot Solloway- "An i:t ttnU ivo bun C lass i t icat ion Scheme Must 

Take the Pt ovjt ammer Into Account " 

The next speaket ot t ho thud session wan l'.lliot Holloway ot 
Yalo thu vet st 1 v » who presented a papei coauthotod by 
W. Johnson, also o t Yalo, anil S. Pi apt* i ot tho University ot 
I’jlilonua. This presentation dot i nod a particular viow ot 
bun I’lasjiil u\tt ion. Uathei than looking at productivity or 
reliability, tho goal tn looking at program bugs was to pro- 
vtdo a basis tot building oomput ot -based tutoring systems 
that can aid tho novioo i n learning to program. The con- 
clusion is that buns ato not tandom occuitoncos but, tathei, 
systemat ic and piovtde a window into misconcopt ions that 
novices have about pi on t arum i no . 

IV ve i op i nn a c lass i t icat ion scheme t ot buns basod solely on 
tho suttaco features ot tho pi on rants themselves is insuf- 
ficient to uniquely classity buns, and it innoios tho undot- 
lytnn mi scotreept ion. What is needed ate heuristic rules 
basod on a hypothesis ot what t lie pr on rainnter ’ s intentions 
woto as ho she ctoated tho pr on t am. Classity inn buns must 
take tho ptontammot into account . 

In t espouse to questions and comments from tho audience, 
Holloway clatvtiod tho following points: 

• Careless pi on rammi no practices ptoduco more errors 
tn code. C lasst t icat ion ot those ot tots becomes 

i tict oas ttin ly mote ditticult as the number ot errors 
t net oases . 

• Mi tot:; in ptontams can bo classified us inn intoima- 
t ton about how they wore t ixed. 

• Vtc bastlr d i st innu i shed between errors and 
faults. Finding a fault loads to a soatch tor the 
e t t o t . 


M Kohfokt 
CSC 
1 1 ot t 


Care must be taken to ensure the quality of data 
collected . 


M. Kolilcdcr 
CSC 

1 2 of i 7 



Donna Buckland --"Sof tware Anomaly Taxonomy--what Can Be 

Gained?" 

The last speaker of the third session was Donna buckland of 
Reifer Consultants. This presentation discussed the results 
of a study to categorize software errors that had been re- 
ported during the stages of testing and operational use of 
the Deep Space Network DSN/Mark 3 system and to build a data 
base for subsequent analysis. 

A three-dimensional classification scheme was devised to 
capture error data for statistical and trend analysis. 

These dimensions are time of occurrence, error criticality, 
and error category. The first dimension defines the par- 
ticular software life cycle phase in which the error was 
introduced. Criticality assesses the severity of the 
error. Error category defines the cause of the error. 

Buckland stated that the collection and classification of 
software error data provides management with a powerful tool 
for isolating problem areas. The data can be used to iden- 
tify error-prone modules and serve as a basis for making 
repair and/or replacement decisions. 

In response to questions and comments from the audience, 
Buckland clarified the following points: 

• Quantification of error data is a very important 
tool . 

• The length of time required to fix a problem is 
also very important and is sometimes overlooked. 

• Vic Basili pointed out that it is often difficult 
to get an individual who fills out a change/error 
report to understand exactly what information is 
needed . 


M. Rohleder 
CSC 

13 of 17 



SESSION 4 - COST ESTIMATION 


Kyle Rone --”Ma intenance Estimation Methodology” 

The first speaker of the fourth session was Kyle Rone of the 
International Business Machines Corporation (IBM). This 
presentation described a systematic approach to providing 
estimates for both staffing and skill levels during the 
maintenance phase of a project. 

The approach presented uses a Rayleigh curve method of pro- 
jection combined with a modified matrix method to forecast 
maintenance needs and required staffing levels. The curves 
generated by both methods are differenced to ascertain how 
much new work can be performed given the staffing level. 
Actual data is compared to projections to validate or modify 
the process. 

In response to questions and comments from the audience. 

Rone clarified the following points: 

• Estimation is not a one-time process; it must be 
applied over and over again. 

• Maintenance activities include correction of both 
latent and ongoing errors. 

• The amount of maintenance required can be reduced 
by applying more quality control during early de- 
velopment phases. Quality is cheaper in the long 
r un . 

• Frank McGarry stated that independent verification 
and validation (IV&V) is appropriate for projects 
with high reliability requirements. The effect of 
I V&V on maintenance costs has not been assessed by 
the SEL. 


M, Rohleder 
CSC 

14 of 17 



• Dave Card asked whether unma i nta inable software has 
ever been encountered. The response from Rone was 
that such software has been encountered and must be 
disposed of. 

• The type of model used in estimation is not as 
important as using a given model regularly with 
good techniques that are transportable. 


M. Kohlodoj 
CSC 

1 5 of 1 7 



Kobe: t Tnuswort h o- -"Staffing Implications of Software 

Productivity Models" 

The second speaker of the fourth session was 

Kobe it Tuuswottho of the Jet Propulsion Laboratory (JPL) . 

His presentation investigated the implications of equating a 
project staffing model with an intercommunication overhead 
model in a small neighborhood of project effort. Highlights 
from the study include the following: there is a calculable 

maximum effective staff level for any project beyond which 
additional staff does not increase the production rate; this 
limits the extent to which effort and time may be traded ef- 
fectively. It becomes ineffective in a practical sense to 
expend more than an additional 25 to 50 percent of resources 
in order to reduce delivery time. Additionally, it was 
pointed out that the project intercommunication overhead can 
be determined from the staffing level for a given project. 

The following point was clarified by Tausworthe in response 
to a question from the audience: Dave Card asked whether 

intercommunication overhead could be reduced by dividing a 
project into a number of tasks that communicate only through 
the manager. Tausworthe replied that the increased man- 
agement activity would increase overhead costs even faster. 


M. Rolilciler 
CSC 

16 of 17 



John Gag f ney --"Estimates of Software Size From State Machine 
Designs " 

The final speaker of the fourth session was John Gaffney of 
the National Weather Service, on loan from IBM, who presented 
a paper coauthored by Robert Britcher of IBM. The presenta- 
tion explained how the length or size of programs (in number 
of source lines of code) represented as state machines can 
be reliably estimated in terms of the number of internal 
state machine variables. Variables here are defined as the 
unique data required by a state machine's transition func- 
tion, not the data retained in the state machine's memory. 
These are equivalent to Halstead's operands. The method- 
ology presented can be employed at successive stages of the 
development process to provide increasingly accurate esti- 
mates. 

The following points were made during the ensuing discussion: 

• Kyle Rone asserted that cost estimation is not an 
exact science; it is a way of accumulating experi- 
ence to make accurate estimates in a given environ- 
ment. 

• Dave Card suggested that different analysts might 
decompose a sti te machine model differently and 
thus get different results. Gaffney replied that 
the effect of such results could be important but 
that they could be minimized by careful and con- 
sistent application of the decomposition technique. 


M. Rohledcr 
CSC 

17 of 17 



WHAT HATE VI LEAIMED III TBK LAST 6 TEAM 


V* 



N83 3235 


MEASURING SOFTWARE DEVELOPMENT TECHNOLOGY 

BY 


FRANK E. McGARRY 
GODDARD SPACE FLIGHT CENTER 


In late 1976, the Goddard Space Flight Center (GSFC) Initiated effort to create 
a software laboratory where various software development technologies and 
methodologies could be studied, measured and enhanced. This laboratory became 
known as the Software Engineering Laboratory (SEL), and since its inception has 
been actively conducting studies and experiments utilizing flight dynamics 
projects in a production environment. The SEL evolved to a full partnership in 
the efforts between GSFC, the University of Maryland and Computer Sciences 
Corporation (CSC). 

The approach that the SEL has taken in carrying out the studies has been |, o 
apply varying methodologies, tools, management concepts, etc. to software 
projects at Goddard; then to closely monitor the entire development cycle so 
that the entire process and product can be compared to similar projects 
utilizing somewhat different approaches. This monitoring function led to a need 
to collect, store and Interpret great amounts of data pertaining to all phases 
of the software process, product, environment and problem, This data collection 
and data processing process has been applied to over 40 software project- 
ranging in size from 2,000 lines of code to approximately 120,000 lines of code 
with the typical project running about 55,000 lines of code. 

The data that has been collected (and is still being collected) and interpreted 
for these projects comes from 5 sources: 

1. Data Collection forms utilized by programmers, managers and support 
personnel. Typical types of data collected include: 

o Error and Change Information 
o Weekly Hours and Resources 

o Component Effort (hours expended on each component by week) 
o Project Characteristics 
o Computer Run Analysis 

o Change and Growth History (week by week records of source code) 
(Additional Information is contained in references 1 and 2) 

2. Computer Accounting Information 

3. Personnel Interviews-during and after the development process 

4. Management and Technical Supervisor Assessments 

5. Tools-used to extract data and measures from source code 


F. McGarry 
NASA/GSFC 
1 of 34 


For the more than 40 projects which have been monitored, approximately 21,000 
forma have been processed and are continually used to perform studies of the 
software development process. To support the storage, validation and usage of 
this information, a data base was designed and built on a POP- 11/70 at Goddard. 
(Reference 1) 

Approach (Chart 2) 

The steps that have been taken to carry out the investigation within the SEL 
have been: 

1. Develop a profile of the software development process as it la 
'now'. First we must understand what we do well and what we do not so well so 
we can build a baseline of current characteristics whereby later we can honestly 
measure change. 

2. Experiment with similar type projects. The second step has been to 
apply select tools, methodologies and approaches to software projects so they 
can be studied for effect . 

1. Measure the process and product. As projects are developed which 
are utilizing different software development techniques, the SEL uses the 
extracted data to determine whether or not the applied technology has made any 
measurable impact on the software characteristics (This may include reliability, 
productivity, complexity, etc.). 

Environment (Chart 1) 

The projects which have been monitored and studied are primarily all flight 
dynamics related software systems. This software Includes applications to 
st\, ort attitude determination, attitude control, maneuver planning, orbi' 
adjust and general mission analysis. 

The attitude systems normally have ry similar characteristic and all are 
designed to utilize graphics as well as to run in batch mode. Depending on the 
problem characteristics, the typical attitude systems range in size from 30,000 
to over 120,000 lines of code.* The percentage of reused code ranges from less 
than 10 percent to nearly 70, percent with the average software package being 
comprised of approximately 30 percent reused code. 

The applications are primarily scientific in nature with moderate reliability 
requirements and normally are not required to run in real time. The development 
period typically runs for about 2 years (from Requirements Analysis through 
Acceptance Testing). The development computers are typically a group of IBM 
S/360's which have very limited resources and where reliability is quite low 
(typically less than 3 hours MTBF) 

Details describing the environment can be found in Reference 1. 


*Here, a line of code is any 80 byte record processable by a compiler or 
assembler (i.e., comments are included) 


F. McGiprv 
N AS A/GSFC 
2 of 34 



Experiments Completed (Chart A) 


As was mentioned earlier, the SEL has monitored over 40 software development 
projects during the 6 years of operation. During this time period, numerous 
methologles, models, tools and general software approaches have been applied and 
measured. The summary results to be presented are based on these projects. The 
summary will be divided into 3 topic areas: 

1. Profiles of the Development Process 

2. Models 

3. Methodologies 


F. McGarry 
NASA/GSFC 

3 of 34 



Profiles of t hr Development Process (Charts S thru I?) 

Tin* tii'Mt step in attempting to mcasu re tlu» effectiveness ot uny software 
t eehnologv is to generate a hast* It no or profile ot how one typically performs 
tils job. Then as modi tied approaches are attempted on similar projects, the 
often s mav be appaient bv comparison. 

Kesour cos Altoc.it ton (Chart /) 

One sot of basic information that on.* may want to understand is Just where do 
p 1 ot' r ammo r s spend their time. When the SKI. looked at n onerous projects to 
understand where tin* time was spent, it found that the SKI. environment deviated 
somewhat from t lie old 40-l’0--4() rule. Typically projects indicated that when the 
total houis expended were based on phase dates of a project (i.e., a specific 
data defined the absolute completion of one phase ot the cycle and the beginning 
ot the next phase ) the breakdown was less than percent tor design, close to 
‘*0 percent lor code and about 10 percent for integration and test. 

When t h<- programmers provided weekly data attributing their time to the activity 
that they felt they were actually doing, no matter what phase of software 
development they were In; the profile looks quite different. The 1 phases 
(designs, code, test) each consumed approximately the seme percent effort and 
over ,’S percent ot the time was attributed to ’other* activities (such as 
travel, training, unknown, etc.). The SKI. has continually found that this 
effort (jojher) exists, and cannot easily be reduced, and most probably should be 
accepted as a given. The SKI. has found it to be a mistake to attempt to 
increase productivity merely by eliminating major portions of this 'other’time. 

Development Resources (Chart 8) 

Another area of concern to the SEL in defining the basic profile of software 
development, was that of staffing level and resource expenditure profilee. Many 
authorities subscribe to the point that there is an optimal staffing level 
profile which should be followed for all software projects. Such profiles as a 
Ra vj e ijgh Cu_r v e are suggested as optimal. Chart 8 depicts characteristics of 
classes of projects monitored in the SKI, and shows the difference in 
productivity and reliability for groups of projects having different staffing 
level profiles. Although the Rayleigh Curve may be acceptable for some 
projects, the SKI, has found that wide variations on these characteristics still 
lead to a successful projects. The SKI. has also found that extr eme deviations 
mav be indicative ol problem software. 

(Detailed information can be found in Reference 4 and *>) 


l ; . Mctiilliv 
NASA CSIV 
4 of .14 



Productivity for large vs. small systems (Chart 9) 


The common belief by many software managers and developers is that as the size 
of a software system increases, its complexity increases at a higher rate than 
the lines of code increase. Because of this fact, it is commonly believed that 
in the effort equation 

E - al b 

where E ■ effort of person time 
where I ■ lines of code 

that the value of b must be greater than 1. The projects that the SEL has 
studied have been unable to verify this belief and Instead have found the value 
of b to approximate .92 in the SEL environment. The fact that this equation is 
nearly linear leads to the counter intuitive point that a project of 150,000 
lines of code will cost approximately 3 times as much as a 50,000 lines of code 
project-instead of 4 or 5 times as much as is often commonly believed. 

(Further details can be found in Reference 6.) 

Productivity Variation (Chart 10) 

Another characteristics that the SEL has been interested in studying has been 
the variations in programmer productivity. Obviously one would want to increase 
the productivity by whatever approach found to be effective, but first we must 
clearly understand what the baseline characteristics of productivity are 
(minimum, maximum, average, difference between small and large projects, etc.); 
only then will we know if we have Improved or not in the years to come. 

As has been found by other researchers in varying environments, the productivity 
of different programmers can easily differ by a factor of a or 10 to l. The SEL 
did find that there was a greater variation (from very low productivity of .5 
l.o. c/hour to 10.8 l.o.c./hour) in small projects. The probable reason for this 
is that newer people are typically put on smaller projects and the SEL has found 
extreme differences in the relatively inexperienced personnel. 

Reusing Code (Chart 11) 

As was stated in the introduction, projects being developed in the SEL 
environment typically utilize approximately 30 percent old code. Although it is 
obviously less costly to integrate existing code into a system rather than 
having to generate new code, there is some cost that must be e tributed to 
adopting the old code. The development team must test, integrate and possibly 
document the old code, so there is some overhead. By looking at approximately 
25 projects ranging in size from 25,000 lines of code to over 100,000 total 
lines of code and ranging in percent of reused code from 0 percent to 70 
percent, the SEL finds that by attributing a value of approximately 20 percent 
overhead cost to reuse code, the expenditures of the 25 projects can best be 
characterized. Now the SEL uses the 20 percent figure for estimating the cost 
of adopting existing code to a new software project. 


F. McGarry 
NASA/GSFC 
5 of 34 



Error Characteristics (Chart 12) 


One of the other characteristics of a software environment that is of great 
concern to developers and managers is that of expected software reliability and 
that of overall software error characteristics. Before attempting to Improve 
software reliability or before attempting to 

minimize the Impact that software errors may have, the SEL had to first 
understand the error characteristics of the typical applications software in the 
f.EL environment. 

By collecting detailed error report data and through the monitoring of numerous 
applications projects many error characteristics have been studied. 

Several pieces of information which are depicted in Chart 12 and which are based 
on 1381 error reports from approximately 15 projects Include: 

o Most errors are local to one component (subroutine or function) 

o Less than 10 percent of errors were attributed to faulty 
requirements 

o A great percent of errors (48 percent) were estimated to be trivial 
to correct (less than 1 hour) 

o A very low percent of errors (7 percent) were estimated to be a 
major effort to fix (greater than 3 days) 

(Further statistics and more detailed explanations can be found in References 7 
and 8). 


F. McGarry 
NASA/GSFC 
6 of 34 


Models (Charts I* through If*) 

A second sot ot studies that t ho SKI, has actively pursued Is that i>t evaluating, 
reviewing, atul developing sottware models. This ineludes resource models, 
reliahllitv models as well as complexity mettles. 

Measures tor Soil wart* (Chart 14) 

The SKI. has attempted to ut.li/e vat ions available sottware mettles to 
characterize the sottware products generated. Sneh mettles as the Met'ahe 
Cvclomatic Complexity, Halstead Length, and lines ot eode were only a tew ot the 
measures that were reviewed. 

It is commonly believed that the si/e ot a component or the eompl'xlty ot a 
component wilt be direct lv correlated t c» the reliability ot that component . One 
set ot studies pertormed In the SKI. attempted to verity this belief. By taking 
over t>‘>0 modules which had very detailed recoids ot etroi data, the SKI. computed 
the correlations ot 4 charact er l st i cm ot the components. The characteristic 
Included total lines ot code, executable lines ot code, Cvclomatic Complexity 
and Halstead Length. The resultant correlations are dlplcted in Chart 14; which 
shows a very high direct correlation t or the 4 measures. 

A second study was performed where the erroi rati* ot each ot tin* component s was 
plotted against si/e as well as against Cyclomatic Complexity. The SKI. expected 
to show that target components have higher error rates than siuallet components 
and that components ot higher complexity rating had higher error rates. The 
plots on Chart 11 show that the results were count et - l ut ul t i ve . The SKI. has 
been unable to verity that larger or more complex component s Indeed have higher 
error rates. 

Cost Models l ('hart I 1 *) 

In addition to the studies made pertaining to various measures tor 
sottware, the SKI. has also utilized the cost data collected from the many 
projects to calibrate and evaluate various available resource estimation models. 
No attempt was intended to quality one model as being any better than another. 
The objective ot the studies was to better understand the s* us i t i v i t i es ot the 
various models and to determine which models seemed to characterize the SKI. 
sottware development environment most consistently. 

In studying these resource models, 4 projects which were somewhat similar in 
size were used as experimental projects. Kach ot the models was ted complete 
and accurate data from the SKI, data base and each was calibrated with nominal 
sets of projects as completely as the experimenters could. Summary results, 
which are given in Chart IS, indicate that, occasional ly, some models can 
accurately predict effort required tor a sottware project. The SKI. has 


1 Met Luis 
N AS A (LSI ( 
’ of L! 



tv i 1 t erat ed what many otln*r software dove lopers and managers claim. Cost models 
should never he used as a sole source of estimation. The user must have access 
to experienced personnel tor estimating and 

must also have access to a corporate memory which can he used to calibrate and 
reintorce someones estimate ot cost. Resource models can only he used as a 
supplemental tool to reintorce ones estimate or to flag possible 
i neons 1st enc ies . 

More detailed information on the SKI, studies can he tound in Reference l, 9, 10, 
S 

ReHabi Mtjr Models (Chart In) 

Another type ot model that the SKI. has spent some efforts in understanding and 
calibrating is the reliability model. Although numerous approaches have been 
suggested as to just how one best predicts the level ot error pro.teness that 
software may have, t he SKI. has only performed any extended studios on one 
model -that which is attributed to John Musa. The model is a maximum likelihood 
ui'thod and the SKI. attempted to apply detailed fault reports from 2 separate 
projects to the model in an attempt to determine it the model could accurately 
predict remaining faults in the software. 

('hart In indicates that one ot the experiments was unite successful and one of 
the experiments was not successful. It slum Id be noted that during and after 
these experiments, John Musa reviewed the results and the data very carefully 
and he has pointed out some possible deficiencies in the SKI, data which could 
possibly lead to erroneous results in this application ot the reliability model. 
One such piece ot data is the granularity with which computer CPU time is 
recorded between reported faults. The SKI, data is not as accurate as the model 
calls i or . 

The charts show that tor experiment 1, the model quite accurately predicted a 
level ot reliability after approximately 1/2 of the total uncovered faults were 
reported. The chart also shows that for experiment 2, the model was still 
predicting a very high number ot errors to be still in the software, when in 
tact a minimal set weie ever uncovered during the several years of operation for 
that system. 

More detailed discussions can be found In Reference 1 and 11. 


1 Met Jam 
NASA CJSl (' 
S of .14 



Method o logies (Charts 17 through 20) 

As was mentioned earlier, one ot the major objectives of the SKL has been to 
measure the effectiveness ot various software development methodologies. The 
SKL has utilized selected development approaches in different applications 
software tasks and then has analyzed the process and product to study the 
relative Impact of the approach. A summary of some of the results of the 
experimentation process is presented here. 

Use of An Independent Verification and Validation Team ( Cha r t 18) 

Many software managers, developers and organizations have advocated the usage of 
an independent IV&V team during the software development process. The major 
advantage of following such an approach, it is claimed, will be the Improvement 
in software reliability, quality, visibility, but not necessarily an improvement 
in overall software productivity. 

In an attempt to evaluate the impact that the usage of an IV&V team may have on 
the SKI, environment, 1 candidate projects were selected to utilize the 
methodology ot an IV&V. Two of the projects were very typical flight dynamics 
systems, each containing over SO, 000 lines of code while the third was a smaller 
flight dynamics project comprised of about 10,000 lines of code. In addition to 
the 1 V&V approach being applied to the projects, the development teams utilized 
the commonly followed standards and approaches normally used by development 
efforts within the SKL environment. 

The projects lasted approximately 18 months, and the IV&V effort was active for 
the entire duration of the project. The size of the IV&V effort was about 18 
percent of the effort of each of the large development efforts. A series of 
measures was defined near the beginning of the experiment by the SKL. These 
measures would be used to determine whether or not the application of the IV&V 
approach was cost effective in the SKL environment. 

A summary of some of the measures is depicted in Chart 18. The results here 
ind l cate : 


o total cost of the project increased-as expected 

o productivity of the development teams (not counting the cost of 
IV&V) was among the lowest of any previous SKL monitored project. 

o rates of uncovering errors found earlier in the development cycle 

was better 

o cost rate to fix all discovered errors was no less than in any 
other SKL projects 

o reliability of the software (error rate during acceptance testing 
and during maintenance and operations) was no different than other SKL projects 


!•' McLain 

nasa c;si T 

of .14 



The conclusion of the SEL, based on these 3 experiments, was that the IV&V 
methodology was not an effective approach In this SEL environment. 

(A more detailed description can be found In Reference 12). 

Effects of MPP on Software Development (Chart 19) 

In an attempt to determine If the utilization of Modern Programming Practices 
(MPP) has any impact (either favorable or unfavorable) on the development of 
software, a set of 10 fairly large (between 50,000 l.o.c. and 120,000 l.o.c.), 
and fairly similar projects (same development environment, same type of 
requirements, same time constraints) was closely examined. These projects all 
had been developed in the SEL environment where detailed information was 
extracted from the projects weekly and where each project had a different level 
of MPP enforced during the development process. 

The MPP' s ranged from various design approaches (such as PDL, Design Walk 
Throughs , etc.) to code and test methodologies (such as structured code, code 
reading, etc.), to various integration and system testing approaches. All of 
the possible MPP's were rated and scaled as to the level to which the practice 
was followed for each project (the rating was done by the SEL researchers, not 
by the software developers). The only purpose of this exercise was to depict 
trends and not to prove that any one single practice was more effective by 
Itself than any other. 

The level to which MPP's were utilized were plotted against productivity and 
against error rate. Chart 19 indicates that the application of the MPP has 
favorably affected productivity by about 15 percent for these experiments. The 
results of software reliability vs MPP is very questionable. The SEL is still 
continuing analysis of additional data. The chart shown is obviously /ery 
inconclusive . 

(More details of this effort can be found in Reference 13). 

Subjective Summary of Effective Practices (Chart 20) 

The previous chart indicated that productivity can be improved by an appreciable 
amount if certain, select practices are applied to the software development 
process. One obviously next would ask, which practices are the most effective? 
The SEL has been attempting to analyze the available data from the 40 
experiments it has conducted to answer this very question. As was seated 
earlier, the SEL feels that these types of experiments can only depict trends 
and cannot accurately isolate one practice as measurable on its own. Whether or 
not this can be done, or whether one should ever attempt it is questionable. 

Most software development methodologies represent an integrated set of practices 
that only are effective when they are applied in a combined, uniform fashion. 
Most practices do not make sense, or at least cannot be effective as a stand 
alone approach. 


F. McGarry 
NASA/GSFC 
10 of 34 



A summary ot the trends that the SKI. has discovered for specific experiments 
conducted Is represented In Chart 20. This chart Is a combination of 
experimental results and subjective Information from the experimenters and users 
and should only be viewed as depleting trends In various approaches. No 
numerical value of Impact can realistically be assigned to the Individual 
practices tested. It seems that practices such as PDL, code reading and 
librarian have proved most beneficial while such techniques as automated flow 
charters, requirements languages and the axrlomatlc design approach have been 
unsuccessful In the SKL. 

Cost of D ata Collec tion (Chart 21) 

The SKI, has been in existence for about 7 years and has been collecting detailed 
software development data for over <> years. Numerous experiments have been 
conducted In an attempt to understand and measure various methodologies for 
developing software. In support of these efforts, one of the most critical and 
difficult elements of the entire experimentation process Is that of data 
col lection . 

The data collection process is time consuming, frustrating, sometimes 
unrewarding, and most assurably Is expensive. Chart 21 shows the overhead cost 
that the SKI, has experienced over the past b years. To accurately collect data 
from the development tasks, the SKL finds that there Is a 1 to 7 percent 
overhead price on the development effort. To process the data that has been 
collected (verification, encoding, data entry, storage, etc.), the SKL has spent 
approximately an additional 10 to 12 percent of the development effort. Finally, 
the SKL experiences indicate that one can spend up to an additional 25 percent 
of the development effort to perform the detailed analysis of the data that has 
been collected. This includes support before, during and after the experiments 
in defining the data to be collected, monitoring the development data and 
effort, formulating hypothesis and performing analysis of the completed 
experiments. The product of the analysis consists of papers, reports, and 
documents . 

(Detailed information on cost can be found in Reference 2). 

Summary (Chart 22) 

In summary, the SKL has had much experience with the data collection process and 
with the experimentation process. Many of Its attempts have been rewarding and 
many have been fruitless, but the SKL feels attempts to assess approaches to 
software have to be conducted If we are ever to evolve to a more productive 
approach to developing software. 


!•’ Mcliuirv 
NASA/GSFC 
1 1 of 14 



REFERENCES 


1. Software Engineering Laboratory, SEL 81-104, T he Software Engineering 
Laboratory , D.N. Card, F. E. McGarry, G. Page, et . al . , February 1982 

2. SEL, 81-101, Guide to Data Collectio n, V. E. Church, D. N. Card, 

F. E. McGarry, et . al., August 1982 

3. SEL, 81-102, Software Engineering Laboratory (SEL) Data Base Organization 
and User's Guide , D. C. Wyckoff, G. Page, F. E. McGarry, et . al., March 1983 

4. Zelkowltz, M. V., "Resource Estimation for Medium Scale Software Projects", 
Proceedings of the Twelfth Conference on the Interface of Statistics and 
Computer Science , New York, Computer Societies Press, 1979 

5. Bailey, J. W., and V. R. Basili, "A Meta-Model for Software Development 
Resource Expenditures", ’ Proceedings of the Fifth International Conference on 
Software Engineering *, New York; Computer Societies Press, 1981 

6. Baslll, V. R., and K. Freburger, ’Programming Measurement and Estimation in 
the Software Engineering Laboratory', Journal of Systems and Software , 

February 1981, Volume 2, No. 1 

7. SLL 81-011, Evaluating Software Development by Analysis of Change Data , 

D. M. Weiss, November 1981 

8. Baslll, V. R., and B. T. Perricone, Software Errors and Complexity: An 

Empirical Investigation , University of Maryland, Technical Report TR-1195, 
August 1982 

9. SEL 80-007, An Appraise! of Selected Cost/Resource Estimation Models for 
Software Systems , J. F. Cook, F. E. McGarry, December 1980 

10. Basili, V. R., 'Software Engineering Laboratory Relationships for 
Programming Measurment and Estimation’, University of Maryland, Technical 
Memorandum, October 1979 

11. SEL 80-005, A Study of the Musa Reliability Model , A. M. Miller, 

November 1980 

12. SEL 81-110, Performance and Evaluation of an Independent Software 
Verification and Integration Process , G. Page, and F. McGarry, September 1982 

13. SEL 82-001, Evaluation of Management Measures of Software Development , 

D. Card, G. Page, F. McGarry, September 1982 


!■ . McGarry 
NASA/GSFC 
1 2 of 34 


F. MeGarry 
NASA/GSFC 
13 of 34 


MEASURING 

SOFTWARE DEVELOPMENT 

TECHNOLOGY 

OR 

SHOULD PROGRAMMERS DO IT 

TOP DOWN ? 




CHART 1 



F. McGarry 
NASA/GSFC 
14 of 34 


SEL APPROACH TO SOFTWARE 
TECHNOLOGY ASSESSMENT 


SOFTWARE EXPERIMENTS IN PRODUCTION ENVIRONMENT: 

NASA APPLICATIONS 


• DEVELOP PROFILE OF ENVIRONMENT 
(SCREENING) 


• EXPERIMENT WITH PROPOSED 
TECHNOLOGIES (CONTROLLED) 


• MEASURE IMPACT AND/OR ASSESS 
TECHNOLOGIES 


- EXTRACT DETAILED DEVELOPMENT 
DATA 

- DETERMINE CHARACTERISTICS OF 
DEVELOPMENT PROCESS 

- APPLY VARIOUS TECHNOLOGIES 
(METHODS, MODELS, AND TOOLS) TO 
APPLICATIONS PROGRAMS 

- EXTRACT DETAILED DEVELOPMENT 
DATA 

- DEFINE MEASURES FOR EVALUATION 

- COMPARE EFFECTS OF USING OR NOT 
USING APPROACHES IN QUESTION 
(SIMILAR PROJECTS) 

- DETERMINE EFFECTIVENESS OF 
TECHNOLOGIES IN QUESTION (WHICH 
ONES HELP AND BY HOW MUCH) 


CHART 2 


331 M6 Oct 





F. McGarry 
NASA/GSFC 


SOFTWARE ENVIRONMENT 


DEVELOPMENT LANGUAGE 
SOFTWARE TYPE 

SIZE 

DEVELOPMENT TIME 

STAFFING 

DEVELOPMENT SYSTEM ... 


FORTRAN (15% MACROS) 

SCIENTIFIC, GROUND- 
BASED INTERACTIVE, 
NEAR-REAL-TIME 

TYPICALLY'v60,G00 SLOC 
(2,000 TO 110,000) 

16 TO 24 MONTHS (START 
DESIGN TO START 
OPERATIONS) 

6 TO 14 PERSONS 

IBM S/360 (PRIMARILY) 
VAX-11 /780 
PDP-11 /70 


SM-PAG-O*) 


CHART 3 



F. McGarry 
NASA/GSFC 
(6 of 34 


EXPERIMENTS WITHIN THE SEL 
1977 THROUGH 1982 
BASIS FOR SUMMARY INFORMATION 

AND CONCLUSIONS 


LABORATORY EXPERIMENTS 46 PROJECTS 

INFORMATION MONITORED 1.8 MILLION SLOC 

PROGRAMMERS/MANAGERS 

REPRESENTED 150 PEOPLE 


DATA EXTRACTED 20,000 FORMS 


METHODOLOGIES APPLIED 


200 QUALIFYING PARAM 
ETERS AND VARIOUS 
MODELS, TOOLS, AND 
METHODOLOGIES 


3M-PAG42*! 


CHART 4 



F. McGarry 
NASA/GSFC 
17 of 34 


AREAS OF DISCUSSION 


• PROFILES 

• MODELS 

• METHODOLOGIES 


334-PAG-Q*) 


CHART 5 



T. Met i, my 


PROFILES 


334-PAG-O*! 


CHART 6 



F. McGarry 
NASA/GSFC 
19 of 34 


WHERE DO 

PROGRAMMERS SPEND THEIR TIME? 


DATE DEPENDENT PROGRAMMER REPORTING 



3M-PAG-0*) 


CHART 7 



F. McGarry 
NASA/GSFC 
20 of 34 


PROFILES OF DEVELOPMENT RESOURCES 



DESIGN CODE AND SYSTEM ACCEPTANCE 


UNIT TESTING TESTING TESTING 


PROFILE 

PRODUCTIVITY 

(SLOC/HOUR) 

TIME » 

RELIABILITY 
(ERRORS/K SLOC) 


• RELATIONSHIP BETWEEN 


RAYLEIGH CURVE 

- 

• 

PROFILE AND 
PRODUCTIVITY 


4.4 -4.6 
2.7— 4.7 

UP TO 2 
UP TO 2 


• NO RELATIONSHIP 
BETWEEN PROFILE AND 


2.7- 2.9 

UP TO 2 


RELIABILITY 


zm ruoii 


CHART 8 



STAFF-MONTHS OF EFFORT 


ARE LARGE PROGRAMS 
HARDER TO BUILD THAN SMALL ONES? 


Z 71 

o & 2 

— > >■ n 


DEVELOPED LINES OF CODE (THOUSANDS) 



CHART 9 


3M-PAG-GC*) 



F. MclJariv 
N ASA/GSI'C 


PRODUCTIVITY VARIATION (SLOC/HOUR) 1 


BY PROJECT 
(ALL CHARGES) 




BY PERSON 
(PROGRAMMER ONLY) 



PEOPLE ARE THE MOST IMPORTANT METHODOLOGY 
V LARGE PROJECT IS GREATER THAN 20K SLOC. 


ZM 


CHART 10 



VS VS 


DEVELOPED SLOC 


DEV SLOC NEW 

i i 

0 10 20 30 


40 50 60 70 80 90 

NEW CODE (**) 


100 

mPAG (2a* I 


CHART 11 




ERROR CHARACTERISTICS 

(MEASURED DURING IMPLEMENTATION) 

TYPES OF ERRORS EFFORT TO CORRECT 




SAMPLE OF 1311 REPORTS 

• MOST ERRORS ARE EASY TO CORRECT 

• SEVERAL-COMPONENT ERRORS ARE LESS THAN EXPECTED 

• REQUIREMENTS ERRORS ARE LESS THAN EXPECTED 

))< PAO !»«•• 


CHART 12 


F. McGarry 
NASA/GSFC 
24 of 34 



F. McGarry 
NASA/GSFC 
25 of 34 


MODELS 

3M4PAG-Ofct 


CHART 13 


RELIABILITY 

(ERRORS PER LINE OF CODE) 


SOFTWARE MEASURES IN THE SEL 


Ki 7 T 

*>■ 
c c/5 2 

t:n = 
-no 
n 



r 1 r , i'i*, i»« *r* Til ■»•*» 

McCABE COMPLEXITY LINES OF CODE 

CORRELATIONS 



TOTAL 

LINES 

EXECUTABLE 

LINES 

McCAPE 

COMPLEXITY 

HALSTEAD 

LENGTH 

HALSTEAD LENGTH 

0.85 

0.91 

0.91 

1.00 

McCABE COMPLEXITY 

0.81 

0.87 

1.00 


EXECUTABLE LINES 

0.84 

1.00 



TOTAL UNES 

1.00 





SAMPLE OF 688 MODULES w/*sofi 


CHART 14 


ORIGINAL PAGE IS 
OF POOR QUALITY 



.VIS!) VSVK 


COMPARISON OF COST MODELS 


ACTUAL PERCENTAGE OF ERROR IN PREDICTION 

EFFORT 


PROJECT 

(MM) 

DOTY 

PRICE S3 

TECOLOTE 

SEL 

COCOMO 

1 

79 

+ 65 

+ 8 

-4 

-6 

_ 

2 

96 

+ 30 

+ 6 

25 

-22 

+ 1 

3 

40 

+ 65 

+ 6 

-8 

+ 93 

— 

5 

98 

+ 74 

0 

+ 3 

-2 

+ 2 

6 

116 

+ 123 

36 

+ 35 

-3 

— 

7 

91 

+ 52 

14 

-12 

-14 

— 

8 

99 

+ 127 

+ 7 

+ 36 

+ 14 

+ 53 

9 

106 

_ _ 

- 

. 

-24 

+ 16 


SOMETIMES, SOME MODELS WORK WELL 


J34PAG<2b*l 


CHART 



F. McGarry 
NASA/GSFC 
28 of 34 


PREDICTING RELIABILITY 

(MUSA MAXIMUM LIKELIHOOD METHOD) 



CHART 16 



F. McGarry 
NASA/GSFC 
29 of 34 


METHODOLOGIES 


3M-PA&-Ob*l 


CHART 17 



ERRORS/K ExLOC MAN-MON1 HS/K SLOC 


A LOOK AT IV£rV METHODOLOGY 

(BASED ON RESULTS FROM 3 EXPERIMENTS) 






• IF YOU MULTIPLY ERRORS FOUND EARLY BY A LATENCY 
FACTOR, IVCrV LOOKS GOOD 

• IF YOU EXAMINE ALL MEASURES, IVCrV LOOKS BAD 

WMtan 


CHART 18 



DEVELOPED LINES 
OF CODE PER HOUR 


EFFECTS OF MPP 

ON SEL SOFTWARE DEVELOPMENT 


PRODUCTIVITY ERROR RATE 



INDEX OF MODERN INDEX OF MODERN 

PROGRAMMING PRACTICES PROGRAMMING PRACTICES 

• PRODUCTIVITY IS ABOUT 15 PERCENT HIGHER 

• RELIABILITY IS HIGHLY VARIABLE 


CHART 19 


334-PAG-Gd*) 



OVERHEAD COST 


WHAT HAS BEEN SUCCESSFUL IN OUR ENVIRONMENT? 



CHART 20 


opiotr-'-:. p.sc 

,E ftrjOR QUA 



I- Mi'(>aii\ 
NASA c. sir 
.1.1 of .14 


COST OF DATA COLLECTION 

(AS A PERCENTAGE OF TASKS BEING MEASURED) 


OVERHEAD TO TASKS (EXPERIMENTS) 

• FORMS 

• MEETINGS 

• TRAINING 

• INTERVIEWS 

• COST OF USING TOOLS 

DATA PROCESSING 

• COLLECTING /VALIDATING FORMS 

• ARCHIVING/ENTERING DATA 

• DATA MANAGEMENT AND REPORTING 

ANALYSIS OF INFORMATION 

• DESIGNING EXPERIMENTS 

• EVALUATING EXPERIMENTS 

• DEFINING ANALYSIS TOOLS 


SEL 

EXPERIENCES 

3—7% 


10 - 12 % 
UP TO 25% 


CHART 21 





F. Mu Gurry 
NASA/GSF C 
34 of 34 


SUMMARY 


• DATA COLLECTION IS EXPENSIVE - BUT 
VERY, VERY IMPORTANT 

• WE MUST UNDERSTAND WHERE WE ARE 
BEFORE HEADING SOMEWHERE ELSE 

• EXPERIMENTATION WILL PAY FOR ITSELF (TRY 
SOMETHING NEW) 

• MPP CAN FAVORABLY IMPACT PRODUCTIVITY 
AND RELIABILITY 

• SOME METHODOLOGIES BUY YOU NOTHING 
(OR EVEN WORSE) 

• MODELS MUST BE UTILIZED WITH GREAT 
CARE 


CHART 22 


3M-FAG42b*l 



PANEL #1 


THE SOFTWARE ENGINEERING LABORATORY (SEL) 

V. Basili, University of Maryland 
A. Goel, Syracuse University 
M. Zelkowitz, University of Maryland 



v N83 32358 

SOFTWARE ERRORS AND COMPLEXITY I 
AN EMPIRICAL INVESTIGATION 


Victor R. Basil! and Barry T. Perrioone 
Department of Computer Sclenoe 
University of Maryland 
College Park, Md. 

1982 


ABSTRACT 


The distributions and relationships derived from the change 
data collected during the development of a medium soale 
satellite software project shows that meaningful results can 
be obtained which allow an insight into software traits and 
the environment in whloh it is developed. Modified and new 
modules were shown to behave similarly. An abstract classif- 
ication scheme for errors which allows a better understand- 
ing of the overall traits of a software project is also 
shown. Finally, various size and complexity metrics are 
examined with respect to errors detected within the software 
yielding some interesting results. 


V. Basili 
U of M 
1 of 49 



1.0 INTRODUCTION 


The discovery and validation of fundamental relation- 
ships between the development of computer software, the 
environment in which the software is developed, and the fre- 
quency and distribution of errors associated with the 
software ate topics of primary cc.icern to investigators in 
the field of software engineering. Knowledge of such rela- 
tionships can be used to provide an insight into the charac- 
teristics of computer software and the effects that a pro- 
gramming environment can have on the software product. In 
addition, it can provide a means to improve the understand- 
ing of the terms reliability and quality with respect to 
computer software. In an effort to acquire a knowledge of 
these basic relationships, change data for a medium scale 
software project was analyzed (e.g., change data is any 
documentation which reports an alteration made to the 
software for a particular reason). 

Tn general, the overall objectives of this paper are 
threefold s first, to report the results of the analyses; 
second, to iview the results in the context of tnuse 
reported by other researchers; and third, to draw some con- 
clusions based on the aforementioned. The analyses 
presented in this paper encompass various types of distribu- 
tions based on the collected change data. The most impor- 
tant of which are the error distributions observed within 
the software project. 

In order for the reader to view the results reported in 
this paper properly, it is important that the terms used 
throughout this paper and the environment in which the data 
was collected are clearly defined. This is pertinent since 
many of the terms used within this paper have appeared in 
the general literature often to denote different concepts. 
Understanding the environment will allow the partitioning of 
the results into two classes: those which are dependent on 
and those which are independent of a particular programming 
envi ronment . 


1.1 DESCRIPTION OF THE ENVIRONMENT 


The software analyzed within this paper is one of a 
large set of projects being analyzed in the Software 
Engineering Laboratory (SEL). The particular project 
analyzed in this paper is a general purpose program for 
satellite planning studies. These studies include among 
others: mission maneuver planning; mission lifetime; mission 
launch; and mission control. The overall size of the 
software project was approximately 90,000 source lines of 
code. The majority of the software project was coded in FOR- 
TRAN. The system was developed and executes on an IBM 360. 

V. Basili 
U of M 
2 of 49 



i t 


H t : » 


;ty 


The developers of the analyzed software had extensive 
experience with ground support software for satellites. The 
analyzed system represents a new application for the 
development group, although it shares many similar algo- 
rithms with the system studied here. 

It Is also true that the requirements for the system 
analyzed kept growing and changing, much more so than for 
the typical ground support software normally built. Due to 
the commonality of algorithms from existing systems, the 
developers re-used the design and code for many algorithms 
needed In the new system. Hence a large number of re-used 
(modi f ied) 

modules became part of the new system analyzed here. 

An approximation of the analyzed software's life cycle 
Is displayed in Figure 1 . This figure only illustrates the 
approximate duration in time of the various phases of the 
software's life cycle. The information relating the amount 
of manpower involved with each of the phases shown was not 
specific enough to yield meaningful results, so it was not 
Included. 


V. basil i 
U of M 
3 of 49 



or;oima», k 
OF POOR QUAu.U' 


i it r eye u or an a i y r r n son wake 



Figure 1 


1. 2 TERMS 

This section presents the definitions and associated 
contexts for the terras used within this paper. A discussion 
of the concepts involved with these terms is also given when 
appropriate. 


Module ; A module is defined as a named subfunction, subrou- 
tine, or the main program of the software system. ’his 
definition is used since only segments written in FORTRAN 
which contained executable code were used for the analyses. 

Change data from the segments which constituted the data 
blocks, assembly segments, common segments, or utility rou- 
tines were not included. However, a general overview of the 
data available on these types of segments is presented in 
Section 4.0 for completeness. 

There are two types of modules referred to within this 
paper. The first type is denoted as modified . These are 

V. Basili 
U of M 
4 of 49 


modules which were developed for previous software projects 
and then modified to meet the requirements of the new pro- 
ject. The second type is referred to as new . These are 
modules which were developed specifically for the software 
project under analyses. 

The entire software project contained a total of 317 
code segments. This quantity is comprised of 36 assembly 
segments, 370 FORTRAN segments, and 111 segments that were 
either common modules, block data, or utility routines. The 
number of code segments which met the adopted module defini- 
tion was 370 out of 517 which is 72 % of the total modules 
and constitutes the majority of the software project. Of 
the modules found to contain errors ^ 9 % were categorized as 
modified and 5U as new modules. 


Number of Source and Executable Lines ; The number of source 
lines within a module refers to the number of lines of exe- 
cutable code and comment lines contained within it. The 
number of executable lines within a module refers to the 
number of executable statements, comment lines are not 
included . 

Some of the relationships presented in this paper are 
based on a grouping of modules by module size in increments 
of 50 lines. This means that a module containing 50 lines 
of code or less was placed in the module size of 50; modules 
between 51 and 100 lines of code into the module size of 
100, etc. The number of modules which were contained in 
each module size is given in Table 1 for all modules and for 
modules which contained errors (i.e., a subset of all 
modules) with respect to source and executable lines of 
code. 


V. Basili 
U of M 
5 of 49 



Number modules 



All 

Modules 

Modules 

with Errors 

Number 

of Linen 

Source 

Exececutable 

Source 

Executable 

0- K 0 

53 

258 

3 

49 

1 

o 

107 

70 

16 

25 

101-150 

80 

26 

20 

13 

151-200 

56 

13 

19 

7 

201-250 

34 

1 

12 

1 

1 

O 

O 

14 

1 

9 

0 

301- 350 

7 

1 

4 

1 

35 I- 1 ) 00 

9 

0 

7 

0 

>400 

10 

0 

6 

0 

Total 

370 

370 

96 

96 


Table 1 


Error : Something detected within the executable code which 
caused the module in which it occurred to perform 
incorrectly (i.e., contrary to its expected function ). 

Errors were quantified from two view points in this 
paper, depending upon the goals of the analysis of the error 
data. The first quanti fication was based on a textual rather 
than a conceptual viewpoint. This type of error quantifica- 
tion is best illustrated by an example. If a was 
incorrectly used in place of a then all occurrences of 
the will be considered an error. This is the situation 
even if the "*"'s appear on the same line of code or within 
multiple modules. The total number of errors detected in 
the 37 0 software modules analyzed was 215 contained within a 
total of 06 modules, implying 26* of the modules analyzed 
contained errors. 

The second type of quantification was used to measure 
the effect of an error across modules, textual errors asso- 
ciated with the same conceptual problem were combined to 
yield one conceptual error. Thus in the example above, all 
Incorrectly used * 's replaced by +'s in the same formula 
were combined and the total number of modules effected by 
that error are listed. This is done only for the errors 
reported in Figure 2. There are a total of 155 conceptual 
errors. All other studies in this paper are based upoon the 


V Basil) 

V ui M 

(i ot 40 



first typo of quantification described. 


Statistical Terms and Methods : All linear regressions of the 
data presented within this paper employed as a criterion of 
goodness the least squares principle (i.e., ’’choose as the 
'best fitting' line that one which minimizes the sum of 
squares of the deviations of the observed values of y from 
those predicted" [1]). 

Pearson's product moment coefficient of correlation was 
used as an index of the strength of the 1 inear relationship 
independent of the respective scales of measurement for y 
and a . This index is denoted by the symbol r within this 
paper. The measure for the amount of variability in y 
accounted for by linear regression on x is denoted as r? 
within this paper. 

All of the equations and explanations for these statis- 
tics can be found in [1j. It should be noted that other 
types of curve fits were conducted on the data. The results 
of these fits will be mentioned later in the paper. 

Now that the software's environment and the key terms 
used within the paper have been defined and outlined, a dis- 
cussion of the basic quantification of the data collected, 
the relationships and distributions derived from this quan- 
tification, and the resulting conclusions are presented. 


2. 0 BASIC DATA 

The change data analyzed was collected over a period of 
33 months, August 1977 through May 1980. These dates 
correspond in time to the software phases of coding, test- 
ing, acceptance, and maintenance (Figure 1) . The data col- 
lected for the analyses is not complete since changes are 
still being made to the software analyzed. However, it is 
felt that enough data was viewed in order to make the con- 
clusions drawn from the data significant. 

The change data was entered on detailed report sheets 
which were completed by the programmer responsible for 
implementing the change. A sample of the change report form 
is given in the Appendix. In general, the form required 
that several short questions be answered by the programmer 
implementing the change. These queries allowed a means to 
document the cause of a change in addition to other charac- 
teristics and effects attributed to the change. The major- 
ity of this information was found useful in the analyses. 

The key information used in the study from the form wa3: the 
data of the change or error discovery, the description of 

V. Busili 
U of M 
7 of 49 



the change or error, the number of components changed, the 
type of change or error, and the effort needed to correct 
the rrror. 

It should be mentioned that the particular change 
report form shown in the Appendix was the most current form 
but was not uniformly used over the entire period of this 
study. In actuality there were three different versions of 
the change report form, not all of which required the same 
set of questions to be answered. Therefore , for the data 
that was not present on one type of form but could be 
inferred, the inferred value was used. An example of such 
an inference would be that of determining the error type . 
Since the error description was given on all of the forms 
the error type could be inferred with a reasonable degree of 
reliability. Data not incorporated into a particular data 
set used for an analysis was that data for which this infer- 
ence was deemed unreliable. Therefore, the reader should be 
alert to the cardinality of the data set used a 3 a basis for 
some of the relationships presented in this paper. There 
was a total of 231 change report forms examined for the pur- 
pose of this paper. 

The consistency and partial validity of the forms was 
checked in the following manner. First, the supervisor of 
the project looked over the change report forms and verified 
them (denoted by his or her signature and the date). 
Second, when the data was being reduced for analysis it was 
closely examined for contradictions. It should be noted 
that interviews with the individuals who filled out the 
change forms were not conducted. This was the major differ- 
ence between this work and other error studies performed by 
the Software Engineering Laboratory, where interviews were 
held with the programmers to help clarify questionable data 
( 8 ). 


The review of the change data as describe J above 
yielded an Interesting result. The errors due to previous 
raiscorrections showed to be three times as common after the 
form review process was performed, i.e. before the review 
process they accounted for 2 % of the errors and after the 
review process they accounted for 6 % of the errors. These 
recording errors are probably attributable to the Tact that 
the corrector of error did not know the cause was due to 
a previous fix because the fix occurred several months ear- 
lier or was made by a different programmer, etc. 


3. 0 RELATIONSHIPS DERIVED FROM DATA 


This section presents and 
from the change data. 


discusses relationships derived 

V. Basil i 
U of M 
8 of 49 



3.1 CHAi.viC DISTRIBUTION BY TYPE 


Typps - changes to the software can be categorized as 
error corrections or modifications (specification changes, 
planned enhancements, clarity and optimization improve- 
ments). For this project, error corrections accounted for 
62# of the changes and modifications 38 #. In studies of 
other SEL projects, errors corrections ranged from 40# to 
64# of the changes. 


UP ERROR D ISTRIBUTION BY MODULES 

Figure 2 shows the effects of an error in terms of the 
number of modules that had to be changed. (Note that these 
errors here are counted as conceptual errors.) It was found 
that 89# of the erro» ~ could be corrected by changing only 
one module. This is a good argument for the modularity of 
the software. It also shows that there Is not a large 
amount of Interdependence among the modules with respect to 
an error. 


NUMBER OF MODULES AFFECTED BY AN ERROR (data set: 211 textual errors) 

174 conceptual errrors) 


//ERRORS 


//MODULES AFFECTED 


IBS (89#) 


1 


9 


2 


3 


6 


4 


1 


5 


Figure 2 


Figure 3 shows the number of errors found per module. 
The type of module is shown in addition to the overall total 
number of modules found to contain errors. 


V. Basili 
U of M 
9 of 49 


NUMBER OF ERRORS PER MODULE (data set: 215 errors) 


^MODULES 

NEW 

MODIFIED 

♦ERRORS/MODULE 

36 

17 

19 

1 

26 

13 

13 

2 

16 

10 

6 

3 

13 

7 

6 

4 

4 

1»» 

3* 

5 

1 

1 «« 


7 



Figure 

3 


The largest number 'if errors found were 7 (located in a 
single new module) and 5 (located in 3 different modified 
modules and 1 new module). The remainder of the errors were 
distributed almost equally among the two types of modules. 

The effort associated with correcting an error is 
specified on the form as being (1) 1 hour or less, (2) 1 
hour to 1 day, (3). 1 day to 3 days, (4) more than 3 days. 
These categories were chosen because it was too difficult to 
collect effort data to a finer granularity. To estimate the 
effort for any particular error correction, an average time 
was used for each category, i.e. assuming an 8 hour day, an 
error correction in category (1) was assumed to take .5 
hourr an error correction in category (2) was assumed to 
take 4.5 hours, category (3) 16 hours, and category (4) 32 
hours. 


The types of errors found in the three most error prone 
modified modules (• in Figure 3) and the effort needed to 
correct them is shown in Table 2. If any type contained 
error corrections from more than one error correction 
category, the associated effort for them was averaged. The 
fact that the majority of the errors detected in a module 
was between one and three shows that the total number of 
errors that occurred per module was on the average very 
small . 


V. Basili 
U of M 
10 of 49 



The twelve errors contained in the t most error prone 
new modules (*• in Figure 3) are shown ii !' *. \e 3 along with 
the effort needed to correct them. 


NUMBER 

(15 

OF ERRORS 
total) 

AVERAGE EFFORT [ 
TO CORRECT 

misunderstood 
or incorrect 
specifications 

8 

24 hours 

incorrect design 
or implementation 
of a module 
component 

5 

16 hours 

clerical error 

2 

4.5 hours 

EFFORT TO CORRECT ERRORS IN THREE 
MODIFIED MODULES 
Table 2 

MOST ERROR PRONE 


NUMBER 

(12 

OF ERRORS 
total) 

AVERAGE EFFORT 
TO CORRECT 

misunderstood 
or incorrect 
requirements 

8 

32 hours 

incorrect design 
or implementation 
of a module 

3 

0.5 hours 

clerical error 

1 

0.5 hours 

EFFORT TO CORRECT ERRORS IN THE TWO 
NEW MODULES 
Table 3 

MOST ERROR PRONE 


V. Basili 
U of M 
11 of 49 



3.3 ERROR DISTRIBUTION BY TYPE 

In Figure 4 the distribution of errors are shown by type. It 
can be seen that 48% of the errors were attributed to 
incorrect or misinterpreted functional specifications or 
requirements. 

The classification for error u^ed throughout the 
Software Engineering Laboratory is given below. The person 
identifying the error indicates the class for each error. 

A: Requirements incorrect or misinterpreted 
B: Functional specification incorrect or misinterpreted 
C: Design error invloving several components 

1. mistaken assumption about value or structure of 
data 

2. mistake in control logic or computation of an 

express i on 

D: Error in design or implementation of single component 

1. mistaken assumption about value or structure of 
data 

2. mistake in control logic or computation of an 
expression 

E: Misunderstanding of external environment 
F: Error in the use of programming language/compiler 
G: Clerical error 

H: Error due to previous miscorrection of an error 


The distribution of these errors by source is plotted 
in Figure ^ with the appropriate subdistribution of new and 
modified errors displayed. This distribution shows the 
majority of errors were the result of the functional specif- 
ication being incorrect or misinterpreted . Within this 
category, the majority of the errors ( 24% ) involved modified 
modules This is most likely due to the fact that the modules 
reused were taken from another system with a different 
application. Thus, even though the basic algorithms were the 
same, the specification was not well enough defined or 
appropriately defined for the modules to be used under 
slightly different circumstances. 


V. Basil i 
U of VI 
1 2 of 4<> 



X ERRORS < n 9SERVED 


ORIGINAL PAGE fS 
OF POOR QUALITY 



V. Basili 
U of M 
13 of 49 



ORIGINAL PAG* IS 
OF POOR QUALITY 



(oral* Comp 

7 y;i* of f ? r o r 


SOURCES OF ERROR ON OTHER PROJECTS 
Figure 5 


The distribution in Figure 4 should be compared with 
the distribution of another system developed by the same 
organization shown in Figure 5. Figure 5 represents a typi- 
cal ground support software system and was rather typical of 
the error distributions for these systems. It is different 
from the distribution for the system we are discussing in 
this paper however, in that the majority of the errors were 
involved in the design of a single component. The reason 
for the difference is that in ground support systems, the 
design is well understood, the developers have had a reason- 
able amount of experience with the application. Any re-used 
design or code comes from similar systems, and the require- 
ments tend to be more stable. An analysis of the two distri- 
butions makes the differences in the development environ- 
ments clear in a quantitative way. 


V. Basili 
U of M 
14 of 49 



The percent of requirements and specification errors Is 
consistent with the work of Endres'[1]. Endres found that 
46% of the errors he viewed involved the misunderstanding of 
the functional specifications of a module. Our results are 
similar even though Endres' analysis was based or. data 
derived from a different software project and programming 
environment. The software project used in Endres' analysis 
•ontained considerably more lines of code per module, was 
written in assembly code, and was within the problem area of 
operating systems. However, both of the software systems 
Endres analyzed did contain new and modified modules. 

Of the errors due to the misunderstanding of a module's 
specifications or requirements (48%), 20 % involved new 
modules while 28 % involved modified modules. 

Although the existence of modified modules can shrink 
the cost of coding, the amount of effort needed to correct 
errors in modified modules might outweigh the savings. The 
effort graph (Figure 6) supports this viewpoint: 50% of the 
total effort required for error correction occurred in modi- 
fied modules; errors requiring one day to more than three 
days to correct accounted for 45% of the total effort with 
27 % of this effort attributable to modified modules within 
these greater effort classes. Thus, errors occurring in new 
modules required less effort to correct than those occurring 
in modified modules. 


V. Basili 
U of M 
1 5 of 44 



Uil.CW'.v. \jt 

OF POOR QUALITY 



The similarity between Endres' results and those 
reported here tend to support the statement that independent 
of the environment and possibly the module size, the major- 
ity of errors detected within software is due to an inade- 
quate form or interpretation of the specifications. This 
seems especially true when the software contains modified 
modules. 

In general, these observations tend to indicate that 
there are disadvantages in modifying a large number of 
already existing modules to meet new specifications. The 
alternative of developing a new module might be better in 
some cases if there does not exist good specifications for 
the existing modules. 


3.^ OVERALL NUMBER OF ERRORS OBSERVED 


Figure 7 displays the number of errors observed in both 
new and modified modules. The curve representing total 


V. Basili 
U of M 
16 of 49 




modules (new and modified) is basically bell-shaped. One 
interpretation is that up to some point errors are detected 
at a relatively steady rate. At this point at least half of 
the total ’•detected-undetected” errors have been observed 
and the rate of discovery thereafter decreases, tt may also 
Imply the maintainers are not adding too many new errors as 
the system evolves. 

It can be seen, however, that errors occurring in 
modified modules are detected earlier and at a slightly 
higher rate than those of new modules. One hypothesis for 
this is that the majority of the errors observed in modified 
modules are due to the misinterpretation of the functional 
specifications as was mentioned earlier in the paper. 
Errors of this type would certainly be more obvious since 
they are more blatant than those of other types and there- 
fore, would be detected both earlier and more readily. (See 
next section.) 


V. Basili 
U of M 
17 of 49 



c'?i‘c.'.r.:AL page rs 

OF POOR QUALITY 



NUMBER OF ERRORS OCCURRING IN MODULES 
Figure 7 


3. 5 ABSTRACT ERROR TYPES 

An abstract classification of errors was adopted by the 
authors which classified errors into one of five categories 
with respect to a module: (1) initialization; (2) control 
structure; (3) interface; (4) data; and (5) computation. 
This was done in order to see if there existed recurring 
classes of errors present in all modules independent of 
size. These error classes are only roughly defined so exam- 
ples of these abstract error types are presented below. It 
should be noted that even though the authors were consistant 
with the categorization for this project, another error 


V. Basili 
U of M 
18 of 49 



analyst may have interpreted the categories differently. 

Failure to Initialize or re-initialize a data structure 
properly upon a module's entry/exit would be considered an 
Initialization error. Errors which caused an "incorrect- 
path" in a module to be taken were considered control 
errors. Such a control error might be a conditional state- 
ment causing control to be passed to an incorrect path. 
Interface errors were those which were associated with 
structures existing outside the module's local environment 
but which the module used. For example, the incorrect 
declaration of a COMMON segment or an incorrect subroutine 
call would be an interface error. An error in the declara- 
tion of the COMMON segment was considered an interface error 
and not an initialization error since the COMMON segment was 
used by the module but was not part of its' local environ- 
ment. Data error would be those errors which are a result 
of the incorrect use of a data structure. Examples of data 
errors would be the use of incorrect subscripts for an 
array, the use of the wrong variable in an equation, or the 
inclusion of an incorrect declaration of a variable local to 
the module. Computation errors were those which caused a 
computation to erroneously evaluate a variable's value. 
These errors could be equations which were incorrect not by 
virtue of the incorrect use of a data structure within the 
statement but rather by miscalculations. An example of this 
error might be the statement A = B + 1 when the statement 
really needed was A = B/C + 1. 

These five abstract categories basically represent all 
activities present in any module. The five categories were 
further partitioned into errors of commission and omission. 
Errors of commission were those errors present as a result 
of an incorrect executable statement. For example, a com- 
missioned computational error would be A = B * C where the 
'*' should have been '+'. In other words, the operator was 
present but was incorrect. Errors of omission were those 
errors which were a result of forgetting to include some 
entity within a module. For example, a computational omis- 
sion error might be A = B when the statement should have 
read A s B + C. A parameter required for a subroutine call 
but not included in the actual call would be an example of 
an interface omission error. In both of the above examples 
some aspect needed for the correct execution of a module was 
forgotten. 

The results of this abstract classification scheme as 
discussed above is given in Figure 8. Since there were 
approximately an equal amount of new (49) and modified (47) 
modules viewed in the analysis, the results do not need to 
be normalized. Some errors and thereby modules were counted 
more than once since it was not possible to associate some 
errors with a single abstract error type based on the error 

V. Basili 

U of M 

19 of 49 


description given on the change report form 


commission omission 



new 

modi fled 

new 

modified 

initialization 

2 

9 

5 

9 

control 

1 2 

2 

16 

6 

interface 

23 

31 

27 

6 

data 

10 

17 

1 

3 

computation 

16 

21 

3 

3 

■ 

28 % 

36* 

64* 

23* 

12* 

35* 



total 




new 

modified 



initialization 

7 

18 — 

25 

(11*) 

control 

28 

8 — 

36 

(16*) 

interface 

50 

37 - — 

87 

(39*) 

data 

1 1 

20 — 

31 

(14*) 

computation 

19 

115 

24 — 

107 

43 

(19*) 


ABSTRACT CLASSIFICATION OF ERRORS 
Figure 8 


According to Figure 8, interfaces appear to be the 
major problem regardless of the module type. Control is more 
of a problem in new modules than in modified modules. This 
is probably because the algorithms in the old modules had 
more test and debug time. On the other hand, initialization 
and data are more of a problem in modified modules. These 
facts, coupled with the small number of errors of omission 
in the modified modules might imply that the basic algo- 
rithms for the modified modules were correct but needed some 
adjustment with respect to data values and initialization 
for the application of that algorithm to the new environ- 
ment . 


3.6 MODULE SIZE AND ERROR OCCURRENCE 


V. Basili 
U of M 
20 of 44 



Scatter plots for executable lines per module versus 
the number of errors found In the module were plotted. It 
was difficult to see any trend within these plots so the 
number of errors/1000 executable lines within a module size 
was calculated (Table 4). 


Module Size 

Errors/ 1000 lines 

50 

16.0 

100 

12.6 

150 

12.4 

200 

7.6 

>200 

6.4 

ERRORS/ 1000 EXECUTABLE 

LINES (INCLUDES ALL MODULES) 
Table 4 


The number of errors was normalized over 1000 executable 
lines of code in order to determine if the number of 
detected errors within a module was dependent on module 
size. All modules within the software were included, even 
those with no errors detected. If the number of errors/1000 
exececutable lines was found to be constant over module size 
this would show independence. An unexpected trend was 
observed: Table 4 implies that there is a higher error rate 
within smaller sized modules. Since only the executable 
lines of code were considered the larger modules were not 
COMMON data files. Also the larger modules will be shown to 
be more complex than smaller modules in the next section. 
Then how could this type of result occur? 

The most plausable explanation seems to be that since 
there are a large number of interface errors, these are 
spread equally across all modules and so there are a larger 
number of errors/ 1000 executable statements for smaller 
modules. Some tentative explanations for this behavior are: 
the majority of the modules examined were small (Table 1) 
causing a biased result; larger modules were coded with more 
care than smaller modules because of their size; errors in 
smaller modules are more apparent and there may indeed still 
be numerous undetected errors present within the larger 
modules since all the "paths" within the larger modules may 
not yet have been fully exercised. 


3. 7 MODULE COMPLEXITY 

Cyclomatic complexity [5] (number of decisions + 1) was 
correlated with module size. This was done in order to 

V. Basili 
U of M 
21 of 49 



determine whether or not larger modules were less dense or 
complex than smaller modules containing errors. Scatter 
plots for executable statraents per module versus the 
cyclomatic complexity were plotted and again, since it was 
difficult to see any trend in the plots, modules were 
grouped according to size. The complexity points were 
obtained by calculating an average complexity measure for 
each module size class. For example, all the modules which 
had 50 executable lines of code or less had an average com- 
plexity of 6.0. Table 5 gives the average cyclomatic com- 
plexity for all modules within each of the size categories. 
The complexity relationships for executable lines of code 
within a module is shown in Figure 9. As can be seen from 
the table the larger modules were more complex than smaller 
modules. 


Module size Average Cyclomatic Complexity 


50 

6.0 

100 

17.9 

150 

28.1 

200 

52.7 

>200 

60.0 


AVERAGE CYCLOMATIC 'COMPLEXITY F0R~ ALL MODULES 

Table 5 


V. Basil i 
U of M 
22 of 49 



ORIGIN. PAGE IS 
OF POOR QUALITY 



i()0 IjO 2 <>n ' *>*» I* 1 ' 1 I »” 4(>t) 


Monm b\7i 


Figure 9 


For only those modules containing errors, Table 6 gives 
the number of errors/ 1000 executable statements and the 
average cyclomatic complexity. When this data is compared 
with Table 5 , one can see that the average complexity of 
the error prone modules was no greater than tne average com- 
plexity of the full set of modules. 


V. Basili 
U of M 
23 of 49 



Module 

Size Average Cyolomatic 

Errors/ 1000 


Complexity 

executable lines 

50 

6.2 

65.0 

100 

19.6 

33.3 

150 

27.5 

24.6 

200 

56.7 

13.4 

>200 

77.5 

9.7 

COMPLEXITY AND ERROR RATE FOR ERR0RED MODULES 

Table 6 


4.0 DATA NOT EXPLICITLY INCLUDED IN ANALYSES 

The 147 modules not included in this study (i.e., 
assembly segments, common segments, utility routines) con- 
tained a total of six errors. These six errors were 
detected within three different segments. One error 
occurred in a modified assembly module and was due to the 
misunderstanding or incorrect statement of the functional 
specifications for the module. The effort needed to correct 
this error was minimal (1 hour or less). 

The other five errors occurred in two separate new data 
segments with the major cause of the errors also being 
related to their specifications. The effort needed to 
correct these errors was on whe average from 1 hour to 1 day 
(1 day representing 8 hours). 


5.0 CONCLUSIONS 


The data contained in this paper helps explain and 
characterize the environment in which the software was 
developed. It is clear from the data that this was a new 
application domain in an application with changing require- 
ments. 

Modified and new modules were shown to behave similarly 
except in the types of errors prevalent in each and the 
amount of effort required to correct an error. Both had a 
high percentage of interface errors, however, new modules 
had an equal number of errors of omission and commission and 
a higher percentage of control errors. Modified modules had 
a high percentage of errors of commission and a small per- 
centage of errors of omission with a higher percentage of 

V. Basili 
U of M 
24 of 49 


data and initialization errors. Another difference was that 
modified modules appeared to be more susceptible to errors 
due to the misunderstanding of the specifications. 
Misunderstanding of a module's specifications or require- 
ments constituted the majority of errors detected. This 
duplicates an earlier result of Endre3 which implies that 
more work needs to be done on the form and content of the 
specifications and requirements in order to enable them to 
be used across applications more effectively. 

There were shown to be some disadvantages to modifying 
an existing module for use instead of creating a new module. 
Modifying an existing module to meet a similar but different 
set of specifications reduces the developmental costs of 
that module. However, the disadvantage to this is that 
there exists hidden costs. Errors contained in modified 
modules were found to require more effort to correct than 
those in new modules, although the two classes contained 
approximately the same number of errors. The majority of 
these errors was due to incorrect or misinterpreted specifi- 
cations for a module. Therefore, there is a tradeoff 
between minimizing development time and time spent to align 
a module to new specifications. However, if better specifi- 
cations could be developed it might reduce the more expen- 
sive errors contained within modified modules. In this 
case, the reuse of "old” modules could be more beneficial in 
terms of cost and effort since the hidden costs would have 
been reduced. 

One surprising result was that module size did not 
account for error proneness. In fact, it was quite the con- 
trary, the larger the module the less error prone it was. 
This was true even though the larger modules were more com- 
plex. Additionally, the error prone modules were no more 
complex across size grouping than the error free modules. 

In general, investigations of the type presented in 
this paper relating error and other change data to the 
software in which they have occurred is important and 
relevant. It is the only method by which our knowledge of 
these types of relationships will ever increase and evolve. 


V. Basili 
U of M 
25 of 49 



Acknowledgments 


The authors would like to thank F. McGarry, NASA Goddard, 
for his cooperation in supplying the information needed for 
this study and his helpful suggestions on earlier drafts of 
this paper. 


References 

(1) Mendenhall ,W. and Ramey ,M., Statistics for Psychology , 
Duxbury Press, North Scituate, Mass., 1973, pp. 280-315. 

(2) Endres,A. ,"An Analysis of Errors and their Causes in 
System Programs", Proceedings of the International Confer- 
ence on Software Engineering, April, 1975, pp. 327-336. 

(3) Belady,L.A. and Lehman, M.M., "A Model of Large Program 
Development", IBM Systems Journal, Vol.15, 1976, pp. 225-251. 

(4) Weiss, D.M., "Evaluating Software Development by Error 
Analysis : The Data from the Architecture Research Facil- 
ity", The Journal of Systems and Software, Vol.1, 1979, pp. 
57-70. 

(5) Schneidewind,N.F. , "An Experiment in Software Error 
Data Collection and Analysis", IEEE Transactions on Software 
Engineering , Vol . SE-5, No. 3, May 1979, pp. 276-286. 

(6) McCabe, T.J., "A Complexity Measure", IEEE Transactions 
on Software Engineering, Vol. SE-2, No. 4, Dec. 1976, 

pp. 308-320. 

(7) Ba3ili,V. and Freburger,K. , "Programming Measurement 
and Estimation in the Software Engineering Laboratory", The 
Journal of Systems and Software, Vol. 2, 1981, pp. 47-57. 

(8) Weiss, D.M.," Evaluating Software Development by 
Analysis of Change Data", University of Maryland Technical 
Report TR-1120, November 1981. 


V. Basili 
U of M 
26 of 49 



ORIGINAL PAGE IS 
OF Poor QUALITY 


NUMEIR 

CHANGE REFORT FORM 

»*OJ«CT NAMI - . — CURRENT OATI 



Change Report Form 


V. Basili 
U of M 
27 of 49 
















THE VIEWGRAPH MATERIALS 
for the 

V. BA SI LI PRESENTATION FOLLOW 


V. Basili 
U of M 
29 of 49 



SOFTWARE ERRORS AND COMPLEXITY: AN 

EMPIRICAL INVESTIGATION 

VICTOR R, BAS I L l 
BARRY T, PERRICONE 

UNIVERSITY OF MARYLAND 


V. Basili 
U of M 
30 of 49 



STUDY OVERVIEW 


STUDY THE ERRORS COMMITTED IN DEVELOPING SOFTWARE 
REVIEW THE RESULTS IN LIGHT OF THOSE FROM OTHER STUDIES 
ANALYZE THE RELATIONSHIP BETWEEN ERRORS AND COMPLEXITY 


V. Basili 
U of M 
31 of 49 



PROJF.CT BACKGROUND 


GCNFRAL PURPOSE PROGRAM FOR SATELLITE PLANNING STUDIES 

size: 90K source line/517 code segments 

370 FORTRAN SUBROUTINES/36 ASSEMBLY SEGMENTS/111 
COMMON MODULES , BLOCK DATA, UTILITY ROUTINES 

MODIFIED MODULES - ADOPTED FROM A PREVIOUS SYSTEM (72%) 

NEW MODULES - DEVELOPED SPECIFICALLY FOR THIS SYSTEM 

REQUIREMENTS FOR THE SYSTEM KEPT GROWING AND CHANGING OVER THE 
LIFE CYCLE 

ERRORS: TWO DEFINITIONS - TEXTUAL (215) AND CONCEPTUAL (155) 

W ERRORS IN MODIFIED MODULES 
51% ERRORS IN NEW MODULES 

ERROR CORRECTIONS VS, MODIFICATIONS 

38% OF CHANGES WERE MODIFICATIONS 
62% OF CHANGES WERE ERROR CORRECTIONS 


V. Basili 
U of M 
32 of 49 


U of 
33 o' 



IVNIQlt 10 



r^'GlNAL PAGE IS 
OH POOR QUALITY 

NUMBER MODULES 



ALL 

MODULES 

MODULES 

WITH ERRORS 

NUMBER OF 
LINES 

SOURCE 

EXECUTABLE 

SOURCE 

EXECUTABLE 

0-50 

53 

252 

3 

49 

51-100 

107 

70 

16 

25 

101-150 

30 

26 

20 

13 

151 -?00 

56 

15 

10 

7 

201-250 

34 

1 

12 

1 

251-500 

14 

.1 

0 

0 

501-550 

7 

1 

4 

1 

351-400 

9 

0 

7 

0 

>400 

10 

0 

6 

0 

TOTAL 

370 

370 

96 

96 


V. Basili 
U of M 
34 of 49 


NUMBER OF MODULES AFFECTED BY AN LRROR (DATA SET: 211 TEXTUAL ERRORS 

17^1 CONCEPTUAL ERRORS) 


!i ERRORS 

If MODULES AFFECTED 

155 ( 39 %) 

1 

9 

7 

5 

3 

f. 

*1 

1 

n 


RESULTS’. SIMILAR TO OTHER STUDIES, FEW ERRORS INVOLVE 
MORE THAN ONE MODULE 


V. Basili 
U of M 
35 of 49 



NUMBER OF ERRORS. AVERAGE EFFORT 

(12 TOTAL) TO CORRECT 


MISUNDERSTOOD 
OR INCORRECT 

REQUIREMENTS 3 32 HOURS 


INCORRECT DESIGN 
OR IMPLEMENTATION 

OF A MODULE 3 0,5 HOURS 


CLERICAL ERROR 


0,5 HOURS 


EFFORT TO CORkECT ERRORS IN THE i'WO MOST ERROR PRONE 

NEW MODULES 


V. Basili 
U of M 
36 of 49 



NUMBER OF ERRORS PER MODULE (DATA SET: 219 ERRORS) 


MODULES 

NEW 

MODIFIED 

'/errors/module 

36 

17 

19 

1 

26 

13 

13 

2 

16 

10 

6 

3 

13 

7 

6 

<i 


1#« 

3* 

3 

1 

2*# 


7 


<r - ^ 


V. Basili 
U of M 
37 of 49 



NUMBER OF ERRORS 

(15 total) 


AVERAGE EFFORT 
TO CORRECT 


MISUNDERSTOOD 
OR INCORRECT 
SPECIFICATIONS 

8 

24 HOURS 

INCORRECT DESIGN 
OR IMPLEMENTATION 
OF A MODULE 
COMPONENT 

5 

16 HOURS 

CLERICAL ERROR 

2 

4,5 HOURS 


EFFORT TO CORRECT ERRORS IN THREE MOST ERROR PRONE 

MODIFIED MODULES 


V. Basil i 
U of M 
38 of 49 



ERROR Dl SIR I BUI ION BY TYPE 


categories: 

a: requirements incorrect or misinterpreted 
b: functional specification incorrect or misinterpreted 

c: DESIGN ERROR INVOLVING SEVERAL COMPONENTS 

D: DESIGN ERROR IN A SINGLE COMPONENT 

E: MISUNDERSTANDING OF EXTERNAL ENVIRONMENT 

F: ERRORS IN PROGRAMMING LANGUAGE OR COMPILER 

G: CLERICAL ERROR 

M: ERROR DUE TO PREVIOUS MI SCORRECT I ON OF AN ERROR 


V. Basil! 
U of M 
39 of 49 



I'.KKOKS OIJSKKVI’I) 



ORIGINAL 











Type of Error 


SEL2 SOURCES OF SOSCLERICAL ERRORS 


ORIGINAL PAGE IS 
OF POOR QUALITY 



V. Basil! 
U of M 
42 of 49 



From previous slide 
adjusted for differences 
in counting schemes 


'll 




0RIG»KM. PAGE !S 
OF POUR QUALITY 



ABSTRACT ERROR TYPES 


CATEGORIES: 

INITIALIZATION “ FAILURE TO INITIALIZE DATA ON ENTRY/EXIT 
CONTROL STRUCTURE - INCORRECT PATH TAKEN 
INTERFACE - ASSOCIATED WITH STRUCTURES OUTSIDE MODULES 
ENVIRONMENT 

DATA - INCORRECT USE OF A DATA STRUCTURE 

COMPUTATION - ERRONEOUS EVALUATION OF A VARIABLE'S VALUE 

COMMISSION “ INCORRECT EXECUTABLE STATEMENT 

OMISSION - NEGLECTING TO INCLUDE SOME ENTITY IN A MODULE 


RESULT: LARGEST PERCENT OF ERRORS INVOLVE INTERFACE ( 39 l) 

CONTROL MORE OF A PROBLEM IN NEW MODULES 
DATA AND INITIALIZATION MORE OF A PROBLEM IN MODIFIED 
MODULES 

SMALL NUMBER OF OMISSION ERRORS IN MODIFIED MODULES 

MIGHT IMPLY - BASIC ALGORITHMS FOR THE MODIFIED MODULES 
WERE CORRECT BUT NEEDED SOME ADJUSTMENT WITH RESPECT 
TO DATA VALUES AND INITIALIZATION FOR THE APPLICATION 
OF THE OLD ALGORITHM TO THE NEW APPLICATION 


V. Basili 
U of M 
44 of 49 





COMMISSION 


OMISSION 


NEW 

MODIFIED 

NEW 

MODIFIED 

INITIALIZATION 

2 

9 

5 

9 

CONTROL 

12 

2 

16 

6 

INTERFACE 

23 

31 

27 

6 

DATA 

10 

17 

1 

3 

COMPUTATION 

16 

21 

3 

3 


23% 

36% 

23% 

12% 




35% 




TOTAL 




NEW 

MODIFIED 



INITIALIZATION 

7 

18 — 

2S 

(11%) 

CONTROL 

28 

8 — 

36 

(16%) 

INTERFACE 

50 

37 — 

87 

( 39 %) 

DATA 

11 

20 — 

31 

(M) 

COMPUTATION 

19 

— 

43 

CJ%) 


115 

107 




ABSTRACT CLASS 


CATION OF ERRORS 


V. Basili 
U of M 
45 of 49 


MODULE SIZE 


ERRORS/1000 LINES 


50 
IDO 
150 
200 
> 200 


16,0 

12.6 

12.4 

7,6 

6.4 


ERRORS/1000 EXECUTABLE LINES (INCLUDES ALL MODULES) 


explanations: 

INTERFACE ERRORS SPREAD ACROSS ALL MODULES 

MAJORITY OF MODULES EXAMINED WERE SMALL BIASING THE RESULT 

LARGER MODULES WERE CODED WITH MORE CARE 

ERRORS IN SMALLER MODULES WERE MORE APPARENT 


V. Basil! 
U of M 
46 of 49 



MODULE SIZE 


AVERAGE CYCLOMAT I C COMPLEXITY 


SO 

6,0 

100 

17,9 

150 

28.1 

200 

52.7 

>200 

60.0 


AVERAGE CYCLOMAT I C COMPLEXITY FOR ALL MODULES 


V. Basil! 
Uof M 
47 of 49 



MODULE *vl2E 

AVERAGE CYCLOMATIC 
COMPLEXITY 

ERRORS/1000 

EXECUTABLE LINES 

50 

6,2 

65.0 

100 

19,6 

33.3 

150 

27,5 

24.6 

200 

56,7 

13.4 

>200 

77.5 

9.7 


COMPLEXITY AND ERROR RATE FOR ERRORED MODULES 


RESULT: AVERAGE CYCLOMATIC COMPLEXITY GREW FASTER THAN SIZE 


V. Basil; 
U of M 
48 of 49 



CONCLUSIONS 


ERROR ANALYSIS PROVIDES USEFUL INFORMATION 

- CAN SEE NEW APPLICATION WITH CHANGING REQUIREMENTS 

- INSIGHTS INTO DIFFERENT ERRORS FOR NEW AND MODIFIED 

MODULES 

- MAJOR ERROR PROBLEMS WITH DIFFERENT APPLICATION EXPERIENCE 

- CAN COMPARE ENVIRONMENTS 

MODULE SIZE AN OPEN QUESTION WRT. ERRORS 

- THE LARGER THE MODULE (WITHIN LIMITS) THE LESS ERROR PRONE 

- WE ARE NOT READY TO PUT ARTIFICIAL LIMITS 

RECOMMENDATIONS: 

- THE ENVIRONMENT MUST BE BETTER UNDERSTOOD 

- MORE DATA MUST BE COLLECTED 

- MORE STUDIES MADE 


V. Basil! 
U of M 
49 of 49 



^N83 32359 

WHEN AND HOW TO USE A SOFTWARE RELIABILITY HODEL 

Axnrit L. Goel 1 , Victor R. Ba»il:T , 
and Petar M. Valdes 3 


Many analytical modala wera propoaad during the laat dacada for 
aoftware reliability assessment. Theaa modala served a uaeful purpoaa 
In identify xug the need for an objective approach to determining the 
quality of a software ayatem as It goes through various stages of dev- 
elopment. However, by and large, these models have not been as widely 
and convincingly used as was expected. 

In this paper we attempt to identify the causes of this state of 
affairs and suggest some remedial actions. For example, we feel that 
very often the models are used without a clear understanding of their 
underlying assumptions and limitations. Also, there seems to be some 
misunderstanding about the Interpretations of model inputs and outputs. 
To overcome some of these difficulties, we provide a classification of 
the available models and suggest which types of models are applica- 
ble in a given phase of the software development cycle. 

The work reported in this paper represents the first step towards 
developing a general methodology for assessing software quality and re- 
liability throughout the development cycle. Further work on this topic 
will be published in the near future. 


Professor of Industrial Engineering and Operations Research; and Com- 
puter and Information Science, Syracuse University. Visiting Professor, 
University of Maryland, College Park, MD. 

2 

Chairman and Professor, Dept, of Computer Science, University of Mary- 
land, College Park, MD. 

3 

Graduate Assistant, University of Maryland. 


A. Goel 
Syracuse U. 
1 of 36 



THE VIEWGRAPH MATERIALS 
for the 

A. GOEL PRESENTATION FOLLOW 


A. Goel 
Syracuse U. 
2 of 36 



WHEN AND HOW TO USE A SOFTWARE 
RELIABILITY MODEL 


Amrit L. Goeu Victor R. Basils 
and Peter M. Valdes 


SEVENTH ANNUAL SOFTWARE ENGINEERING WORKSHOP 

nasa/gsfc 

DECEMBER 1 / 1932 


A. God 
Syracuse U. 
3 of 36 


OUTLINE 


SOFTWARE RELIABILITY 
SOFTWARE RELIABILITY MODELS 
- CLASSIFICATION 
SOFTWARE DEVELOPMENT PHASES 
APPLICABILITY OF MODELS IN EACH PHASE 
DISCUSSION OF MAJOR MODEL ASSUMPTIONS 


A. Gool 
Syracuse U. 
4 of 36 



ORIGINAL PAGE 18 
OF POOR QUALITY 


SOFTWARE 


SOFTWARE (ALSO CALLED PROGRAM) 

IS ESSENTIALLY AN INSTRUMENT FOR 
TRANSFORMING A DISCRETE SET OF INPUTS 
(FROM INPUT DOMAIN) INTO A DISCRETE SET 
OF OUTPUTS (INTO ITS OUTPUT SPACE) 



A. Goel 
Syracuse U. 
5 of 36 




SOFTWARE ERROR 


SOFTWARE ERROR IS A DISCREPANCY 
BETWEEN WHAT THE SOFTWARE DOES AND 
WHAT THE USER OR THE COMPUTING 
ENVIRONMENT (PHYSICAL MACHINE, O/S, 
COMPILER, ETC. ) WANTS IT TO DO. 


A.Goel 
Syracuse U. 
6 of 36 


SOFTWARE RELIABILITY 


o THE PROBABILITY THAT SOFTWARE WILL NOT CAUSE THE 
FAILURE OF A SYSTEM TO PERFORM A REQUIRED TASK 
OR MISSION FOR A SPECIFIED TIME IN A SPECIFIED 
ENVIRONMENT. 


o AN ATTRIBUTE OF SOFTWARE QUALITY PERTAINING TO THE 
EXTENT TO WHICH A COMPUTER PROGRAM CAN BE EXPECTED 
TO PERFORM ITS INTENDED FUNCTION WITH REQUIRED 
PRECISION. 


A. Goel 
Syracuse U. 
7 of 36 



SOFTWARE RELIABILITY 


LET E BE A CLASS OF ERRORS OF INTEREST AND T BE 
A MEASURE OF RELEVANT TIME (UNITS DETERMINED BY 
THE APPLICATION AT HAND), 

THEN THE RELIABILITY OF A SOFTWARE PACKAGE WITH 
RESPECT TO THE CLASS OF ERRORS E AND WITH RESPECT 
TO THE METRIC T IS THE PROBABILITY THAT NO ERROR 
OF THE CLASS OCCURS DURING THE EXECUTION OF THE 
PROGRAM FOR A PRESPECIFIED PERIOD OF RELEVANT 
TIME, 


A. Goel 
Syracuse U. 
8 of 36 


NEED FOR SOFTWARE RELIABILITY, ASSESSMENT 


o ESTIMATE POTENTIAL RELIABILITY DURING CONCEPTUAL 
PHASE 

o ESTABLISH REALISTIC NUMERICAL RELIABILITY GOALS 
DURING DEFINITION PHASE 

o ESTABLISH EXISTING LEVELS OF ACHIEVED RELIABILITY 

o MONITOR PROGRESS TOWARD ACHIEVING SPECIFIED 
RELIABILITY 60ALS OR REQUIREMENTS 

o ESTABLISH RELIABILITY CRITERIA FOR FORMAL 
QUALIFICATION 


A. Goel 
Syracuse U. 
9 of 36 



ORIGINAL PAGE IS 
'< POOR QUALITY 

GENERAL APPROACH 



A. Goel 
Syracuse U. 
10 of 36 




ORDINAL PAGE » 
OF POOR QUALITY 



FLOWCHART FCR SOFTWARE FAILURE DATA 
ANALYSIS AND DECISION MAKING 


A. Goel 
Syracuse U. 
11 of 36 










ORIGINAL PAGE If 
OF POOR QUALITY 



INPUT 

DOMAIN 


PROGRAM 


ERROR 

HISTORY 



TESTING PROCESS AND ERROR HISTORY 


A. God 
Syracuse U. 
12 of 36 



SOFTWARE RELIABILITY MODELS 


TIME-DEPENDENT MODELS 

ASSUMPTIONS OF MODELS EMPHASIZING DETECTION PROCESS 

FAILURES ARE INDEPENDENT 

NUMBER OF FAILURES IS CONSTANT 

EACH FAILURE IS REPAIRED BEFORE TESTING CONTINUES 

INPUTS WHICH EXERCISE THE PROGRAM ARE RANDOMLY SELECTED 

ALL FAILURES ARE OBSERVABLE 

TESTING IS OF UNIFORM INTENSITY AND REPRESENTATIVE OF OPERATIONAL ENVIRONMENT 
FAILURE RATE AT ANYTIME IS PROPORTIONAL TO CURRENT NUffiER OF FAILURES 


A. Goel 
Syracuse U. 
14 of 36 


OVERVIEW OF SOFTWARE RELIABILITY MODELS 



■5 Q 

If 

30 «• 
o “O 

SI 

s 

3a 


TIMES BETWEEN 


FALURE COUNHNG 

SOFTWARE FAILURES 


PROCESS 


-JELINSKI a M0RAN0AII972) 

— SCHCK awOWERTON IIS73I 
-UTTlEWOOOa VERRALLII975) 


— SHOOMAN (1972) 

— MUSA (1975) 

— 9CHNEIOEWMO0975) 


— MOR AND A( 1975) 


•— MORANDA (1975) 


— UTUEWOOD (1976) 

- FORMAN a StNGPURWALLA(l978) 

— 60c ! . 8 OKUMOTO (1978) 


t GOELaOKUMOnXI979) 
GOEL 8 OKUMOTO 1981) 




OWOfWAL* PAQg* to 

0F Po °* quality 


SOFTWARE REL I ABILITY MODELS 
TIME INDEPENDENT MODELS 


USE OBSERVED RESULTS OF EXPERIMENTS CONDUCTED ON ELEMENTS OF THE 
PROGRAM'S INPUT SPACE 


USE A-PRIORI KNOWLEDGE OF INPUT SPACE 

TWO CLASSES 
ERROR SEEDING 
INPUT SPACE SAMPLING 


A. Gocl 
Syracuse U. 
15 of 36 


A. Goel 
Syracuse U. 
16 of 36 



t 





ASSUMPTIONS 


I . T.IME.S BETWEEN FAILURE. MODELS 

- INDEPENDENT INTERFAILURE TIMES 

- EQUAL PROBABILITY OF EXPOSING EMBEDDED ERRORS 

- ERRORS EMBEDDED ARE INDEPENDENT 

- TIME-DEPENDENCE 

- IMMEDIATE ERROR REMOVAL, PERFECT ERROR REMOVAL, 

NONINTRODUCTION OF NEW ERRORS 

- RELIABILITY BASED ON REMAINING NUMBER OF ERRORS 

1 1 . FAILURE COUNT IMG MODELS 

- ERRORS IN NONOVERLAPPING TIME INTERVALS ARE INDEPENDENT 

- F A I LURE RATE PROPORTIONAL TO EXPECTED ERROR CONTENT 

- DECREASING FAILURE RATF WITH TIME (DISCRETE OR 

CONTINUOUS) 

III. ERROR" SLED I NO. MODELS 

- INDIGENOUS AND SEEDED ERRORS HAVE EQUAL PROBABILITY 

OF BEING DETECTED 

IV, INPUT DOMAIN BASED.. MODELS 

- INPUT PROFILE DISTRIBUTION IS KNOWN 

- RANDOM TESTING IS USED 

- INPUT DOMAIN CAN BE PARTITIONED INTO EQUIVALENCE CLASSES 


A. Goel 
Syracuse U. 
1 7 of 36 



SOME LIMITATIONS OF MOST MOD ELS 


O INDEPENDENCE OF TIMES BETWEEN FAILURES 


O EQUAL IMPORTANCE TO DIFFERENT TYPES OF 
ERRORS 

O SAME FAILURE RATE FOR EACH ERROR 


O NO PROVISION FOR INTRODUCTION OF NEW 
ERRORS 


O DECREASING FAILURE RATE DURING DEBUGGING 
OR OPERATION 


A. Goei 
Syracuse U. 
18 of 36 



INDEPENDENT INTERFAILURE TIMES 


NOT A REALISTIC ASSUMPTION IN GENERAL. ESPECIALLY 
WHEN THE TESTING PROCESS IS NOT RANDOM. TIME TO 
NEXT FAILURE MAY VERY WELL DEPEND ON THE NATURE OF 
THE PREVIOUS FAILURE. IF THE PREVIOUS ERROR WAS 
CRITICAL. WE MIGHT INTENSIFY TESTING AND LOOK FOR 
ADDITIONAL CRITICAL ERRORS. WHICH IMPLIES NON- 
INDEPENDENT INTERFAILURE TIMES. 

NHPP TYPE MODELS ARE ROBUST TO SUCH LACK OF 
INDEPENDENCE. 


A. Gocl 
Syracuse U. 
19 of 36 


S OFTWARE E A I LUB E -BAIE-1.S--PRQFQRT I.QNAL IQ NUMBER 
QF -REMAINING ER R ORS 


DOES NOT HOLD IN MANY CASES, 

REMAINING ERRORS THAT RESIDE IN THE FREQUENTLY 
USED PORTION OF THE CODE ARE MORE LIKELY TO BE 
DETECTED THAN OTHERS, 

IF, HOWEVER, TESTING IS REPRESENTATIVE OF USE, 
FAILURE RATE COULD BE CONSIDERED PROPORTIONAL TO 
ERROR CONTENT, 


A. Goel 
Syracuse U. 
20 of 36 


ERRORS DETECTED ARE IMMEDIATELY CORRECTED 


NOT A REALISTIC ASSUMPTION IN MOST PRACTICAL 
SITUATIONS. 


A. Goel 
Syracuse U. 
21 of 36 



CORRECTION PROCESS DOES NOT INTRODUCE NEW ERRORS 


VERY RARELY SATISFIED IN PRACTICE. A PARTIAL 
SOLUTION WAS ATTEMPTED IN THE IMPERFECT DEBUGGING 
MODEL, BUT A GENERAL SOLUTION IS NOT AVAILABLE. 


A. Goel 
Syracuse U. 
22 of 36 



TESTING PROCESS IS REPRESENTATIVE OF 
OPERATIONAL ENVIRONMENT 


THIS IS RARELY TRUE. WE PREFER A RELIABILITY 
MEASURE BASED ON USER REQUIREMENTS RATHER THAN A 
SIMPLE UNCONDITIONED SOFTWARE RELIABILITY MEASURE, 


A. God 
Syracuse U. 
23 of 36 


USE OF EXECUTION TIME BETWEEN FAILURES 


HAVE TO USE IT WITH CAUTION, ONE DEBUGGER COULD 
RUN AND RERUN THE PROGRAM TO UNCOVER REMAINING 
ERRORS CAUSING HIGH EXECUTION TIME BETWEEN FAILURES 
WHILE ANOTHER ONE MIGHT ANALYZE THE PROGRAM IN DETAIL 
AND THEN RUN THE (SAME) PROGRAM JUDICIOUSLY, FORMER 
CASE WOULD GIVE A WRONG IMPRESSION OF HIGHER RELIA- 
BILITY, 


A. Goel 
Syracuse U. 
24 of 36 



INCREASING FAILURE RATE BETWEEN FAILURES 


CONTRARY TO THE ASSUMPTION THAT SOFTWARE DOES NOT 
WEAR OUT, BUT/ THIS WOULD BE GO IF TESTING IN- 
TENSITY INCREASES DURING SUCH INTERVALS, OVERALL/ 
NOT A REALISTIC ASSUMPTION, 




A. Goel 
Syracuse U. 
25 of 36 


SOFTWARE DEVELOPMENT PHASES 


DESIGN 

UNIT TESTING 
INTEGRATION TESTING 
ACCEPTANCE TESTING 
OPERATION 


A. Goel 
Syracuse U. 
26 of 36 



APPLICABILITY OF EXISTING SOFTWARE 
RELIABILITY MODELS 


DESIGN 

, EXISTING MODELS NOT APPLICABLE 


, SEEDING MODELS APPLICABLE IF WE CAN ASSUME THAT 
INDIGENOUS AND SEEDED ERRORS HAVE EQUAL 
PROBABILITIES OF DETECTION, 

, INPUT DOMAIN BASED MODELS MAY BE APPLICABLE, 

, TBF AND FC MODELS NOT APPLICABLE, 

miEGRAIIQNJIEGIING 

, ALL MODELS APPLICABLE IF RANDOM TESTING IS USED, 

, FC MODELS MAY BE APPLICABLE FOR DETERMINISTIC TESTING, 

ACCEPTANCE TESTING 

. INPUT DOMAIN BASED MODELS APPLICABLE. 

, ERROR SEEDING MODELS NOT APPLICABLE 
, TBF AND FC MODELS DO NOT SEEM TO BE APPLICABLE AS 
ERRORS ARE NOT IMMEDIATELY CORRECTED; SOME TBF AND 
FC MODELS MAY BE ROBUST TO THIS REQUIREMENT 


, INPUT DOMAIN MODELS MAY BE APPLICABLE PROVIDED USER 

INPUTS ARE RANDOM FROM THE INPUT PROFILE DISTRIBUTION, 


A. Goel 
Syracuse U. 
27 of 36 



DESIGN PHASE 


o User requirements ar". transformed to computer 

COMPATIBLE SPECIFICATIONS, 

o Design errors may be corrected by visual in- 
spection OR BY OTHER INFORMAL PROCEDURES, 

o Existing software reliability models are not 

APPLICABLE AT THIS STAGE BECAUSE 

- TEST CASES TO EXPOSE ERRORS REQUIRED 
BY SEEDING AND INPUT DOMAIN BASED 
MODELS DO NOT EXIST 

- ERROR HISTORY REQUIRED BY TIMES BETWEEN 
FAILURES AND FAILURE COUNT MODELS DOES 
NOT EXIST 


A. Goel 
Syracuse U. 
28 of 36 



UNIT TESTING 


EACH MODULE HAS ITS OWN SPECIFIED INPUT DOMAIN AND 
OUTPUT SPECIFICATION. 

MODULE SPECIFICATION IS TRANSFORMED INTO A PROGRAM 
(CODING). 

TEST CASES BASED ON THE INPUT DOMAIN AND OUTPUT 
SPECIFICATION ARE DESIGNED TO EXPOSE ERRORS. THE 
TEST CASES DO NOT USUALLY FORM A REPRESENTATIVE 
SAMPLE OF THE OPERATIONAL PROFILE DISTRIBUTION. 

TIMES BETWEEN EXPOSURE OF ERRORS ARE NOT RANDOM SINCE 
TEST CASES ARE EXECUTED AND DESIGNED IN A DEIEEMlNiJSIi.C 
FASHION. 

EXPOSED ERRORS ARE CORRECTED (DEBUGGED). 


A. Gocl 
Syracuse 11. 


UNIT testing: reliability models 


SEEDING MODELS ARE APPLICABLE IF WE CAN ASSUME THAT 
INDIGENOUS AND SEEDED ERRORS HAVE EQUAL PROBABILITIES 
OF DETECTION 

INPUT DOMAIN BASED MODELS MAY BE APPLICABLE 

IF TESTS CAN BE MATCHED WITH THE OPERATIONAL PROFILE 
DISTRIBUTION 

TBF AND FC MODELS NOT APPLICABLE 


A. Goel 
Syracuse U. 
30 of 36 



INTEGRATION TESTING 


MODULES ARE INTEGRATED INTO SUBSYSTEMS OR INTO THE 
WHOLE SYSTEM, 

TEST CASES ARE GENERATED TO VERIFY THE CORRECTNESS 
OF THE WHOLE SYSTEM. 

DUE TO THE COMPLEXITY OF THE INTEGRATED SYSTEM, TEST 
CASES MAY BE GENERATED 

- RANDOMLY (BASED ON AN INPUT PROFILE DISTRIBUTION); 

- DETERMINISTICALLY (BASED ON A SET OF TEST CRITERIA), 

EXPOSED ERRORS ARE CORRECTED, HOWEVER, ADDITIONAL 
ERRORS MAY BE INTRODUCED, 


A. Goel 
Syracuse U. 
31 of 36 



INTEGRATION TESTING! RELIABILITY MODELS 


ALL MODELS APPLICABLE IF RANDOM TESTING 
IS USED. 

FAILURE COUNT MODELS MAY BE ROBUST TO LACK 
OF INDEPENDENCE AND COULD BE USED FOR 
DETERMINISTIC TESTING. 


A. Goel 
Syracuse U. 
32 of 36 



ACCEPTANCE TESTING 


SOFTWARE IS GIVEN TO "FRIENDLY USERS." 

THESE USERS GENERATE TEST CASES (USUALLY RANDOM) 

TO VERIFY SOFTWARE CORRECTNESS. THE GENERATED TEST 
CASES MAY BE ASSUMED REPRRSERIALLYE OF THE QRERAI1QNA 
PROEIiX J31STRIJ5D.T1QN. 

USUALLY EXPOSED ERRORS ARE N.QI IMMEDIATELY CORRECTED. 


A. Goel 
Syracuse U. 
33 of 36 



OPERATIONAL PHASE 


SOFTWARE IS PUT INTO USE. 

INPUTS MAY NOT BE RANDOM ANYMORE SINCE A USER 
MAY BE USING THE SAME SOFTWARE FUNCTION ON A 
ROUTINE BASIS. INPUT MAY BE CORRELATED. 

ERRORS ARE ML IMMEDIATELY CORRECTED. APPLICABLE 
MODELS (MAY NOT SATISFY ALL ASSUMPTIONS). 

INPUT DOMAIN BASED MODELS, 


A. Goel 
Syracuse U. 
34 of 36 


ERQBLEMS_W±IH_B£L 1 ABILITY. .ASSLSSMENI 


SOMETIMES MODELS ARE USED (SUCCESSFULLY OR OTHERWISE) WITH 
INCOMPLETE UNDERSTANDING OF UNDERLYING ASSUMPTIONS AND 
LIMITATIONS. 

ROBUSTNESS TO DEVIATIONS FROM ASSUMPTIONS IS NOT FULLY KNOWN. 

APPLICABILITY OF MODELS IN DIFFERENT ENVIRONMENTS NEEDS 
FURTHER WORK. 

MEASUREMENT (FOR RELIABILITY ASSESSMENT) IS DONE TOO LATE 
IN THE LIFE CYCLE. 

NEED FOR MODEL SIMPLICITY (USABILITY) VS. CAPTURING DETAILS 
OF REALITY NOT FULLY APPRECIATED. 


A. Goel 
Syracuse U. 
35 of 36 



CURRENT ACTIVITIES 


•• EXAMINING RELIABILITY MEASURES ACROSS Ai_L LIFE CYCLE 
PHASES 

- STUDYING EFFECTS OF TESTING ON RELIABILITY 

- EXPLORING USE OF TEST CRITERIA AS MEASURES OF QUALITY 

AND RELIABILITY 

- DEVELOPING RELATIONSHIPS BETWEEN DESIGN, COMPLEXITY, 

TESTING AND RELIABILITY 

BASICALLY STUDYING THE ENTIRE LIFE CYCLE RATHER THAN JUST 
THE FINAL TESTING PHASE FOR QUALITY AND RELIABILITY 
ASSESSMENT. 


A. Goel 
Syracuse U. 
36 of 36 



SOFTWARE PROTOTYPING IN THE SOFTWARE ENGINEERING LABORATORY 


MARVIN V. 7.ELKOWITZ K 8 3 ^ 2 

DEPARTMENT OF COMPUTER SCIENCE V* v 
UNIVERSITY OF MARYLAND 
COLLEGE PARK, MARYLAND 20742 


INTRODUCTION 

Over the Lett few years, several techniques have become popular withing 
the software engineering world. Concepts tike "structured programming," "dis- 
tributed processing," "expert systems," and others have all been proposed as a 
means to enhance software productivity. Recently the term "prototyping" has 
been applied to productivity improving (SEN80, SEN82) . The NASA Goddard 
Software Engineering Laboratory is starting a project to evaluate prototyping 
within the NASA environment. 

First of all, there are several definitions of a prototype. The diction- 
ary defines it as an original or model on which something is based or formed. 
However, in looking at several computer glossaries through the year 1981, not 
one of them mentions a prototype software development. Thus the term is quite 
new and has yet to be standardized. 

Prototyping Is not modeling - another well used concept. In a model we 
are looking at only a few characteristics of an object. For example, in a wind 
tunnel, we are Interested in the airstream past an airplane, not in its inter- 
nal design. However, In a software prototype, we usually mean a complete work- 
ing system, although it may be missing some functionality. Thus we are doing 
more than modeling, or its companion operation - simulation. We wish to build 
a system that demonstrates most of the behavior of the final product. 


PROTOTYPING 

In developing a prototype for NASA we need to understand what a prototype 
is. More importantly, for NASA, the issue of prototyping must answer the fol- 
lowing questions: 

What are the goals of a prototype? Is it to develop the requirements for 
a product? Evaluate its performance? Predict its final costs? 

What are the issues involved? How does one design for a prototype? Does 
the software lifecycle change? Do we want multiple prototypes for different 
phases of the life cycle? How do we use a prototype when built? 

What tools can be used to design a prototype? to build a prototype? to 
evaluate a prototype? 

How does one measure a prototype? How do you know if your prototype was 
successful? Should you invest the cost and build the full system or abandon 
the project? What SHOULD a prototype cost? 10% of the final product or 50% or 

M. Zelkowitz 

U. of M. 

1 of 22 



even 100%? 


The final question It doat prototyping avan fit Into tha NASA environ- 
ment? Evary software development environment It unique, and tachnlquat which 
work in one environment might not work In another, to It talking about proto- 
typing at NASA even relevant? 

These are all questions which must be addressed, and tha currant project 
Is one data point In evaluating Its effectiveness. 


RESEARCH ISSUES 

WHAT IS A PROTOTYPE? There are several different models. In one It Is a 
quick, dirty throw away Implementation for evaluation purposes. The goals are 
to get something working quickly. This Is often useful when the full require- 
ments are not know well at the start and the prototype can be used to refine 
these requirements. 

WHAT PROGRAMMING LANGUAGE SHOULD BE USED? There are several views as to 
the language that Is to be used In a prototype. A low level language (e.g., 
Fortran, PL/I) can be used as the same Implementation language for the full 
system. This leads to greater efficiency in the final prototype, but forces 
the programmer to design more details Into the initial implementation. 

There are several high level languages that have been proposed for proto- 
typing. Snobol4 and SETL are two such examples. Both allow the programmer to 
avoid many details at a cost in execution speed. Unfortunately, these high 
level languages are not universally available and can not be used on all pro- 
jects. 


There is also research on very high level languages - often called 
specification or non-procedural languages. These specify what is to happen and 
not how, thus are good for a prototype where performance is not critical. How- 
ever, these are still very experimental and not yet available in a production 
environment. 


WHAT ARE PROGRAMMER CAPABILITIES? One unfortunate issue In the current study- 
ing of prototyping, is that It is a research topic being investigated by 
expert "supercoriers". Once prototyped, a system Is then built by "mere mor- 
tals". What will happen If prototyping becomes "an accepted" technique and 
mere mortals must build the initial design? 


SOFTWARE ENGINEERING LABORATORY 

So far the issue of prototyping has been described in very general terms. 
However, how does it apply to the NASA Software Engineering Laboratory? 
Within the Laboratory, three characteristics of software are under study: Pro- 
files, Models and Methodologies. The effects of prototyping on each of these 
will be described. 


M. Zelkowitz 
U. of M. 

2 of 22 



PROFILES. On* important aspect of th* SEL it simply to mtasur* softwar*. 
V#ry littl* la g*n*rally know of a quantitativ* natur* about software. This 
is certainly true of prototyping, one important goal is to simply add 
prototyping projects to the SEL data base in order to apply previous SEL ana* 
lyses to this project as had been done to previous projects. Oo cost models 
work? reliability models? error models? W* need to simply characterize this 
software (SEL82). 

MODELS. Once data it collected on prototyping projects, we need to evalu- 
ate models to see if they apply. Previously the SEL evaluated various cost 
models (Rayleigh, etc.). Do these apply to a prototype? Should they even 
apply? Is another model more appropriate? 

METHODOLOGIES. Finally we need to revise the standard life cycle to 
account for prototypes. How are they designed, built and evaluated? 


FLIGHT DYNAMICS ANALYSIS SYSTEM (FDAS) 

At NASA a new product Is being designed which seems like a good candidate 
for prototyping. This system, the Flight Dynamics Analysis System (FDAS), is 
being built to help experimenters try alternative flight dynamics models. 

For example, today if an experiment Is to be run (e.g., try a new orbit 
calculation model), the experimenter must access the Fortran source library, 
know which module to modify and make the changes, test the changes, recreate a 
new load module, and then run the experiment. The experimenter must have 
detailed knowledge of the software and the changes are a time consuming opera- 
tion. 


With FDAS, the experimenter enters the system, and an Interactive dialo- 
gue, controlled by a data base, directs the experimenter to the correct module 
and aids In the change. Thus changes to software are easier, require less time 
and less expertise about the Internals of th* system. 

Now why Is this a good candidate for prototyping? In the past, software 
has generally been built for ground support software. Similar projects have 
been built for the last 15 to ?.0 years, thus NASA Is an expert at such 
software. Issues like: 

Requirements 

Size 

Execution characteristics 
User interface 
Algorithm design 
Cost 

are all well known (or as well known as is possible). Thus prototyping would 
not aid significantly. One can view all previous developments as "prototypes" 
for the next one. 

However, FDAS is a very different system. Most of the factors mentioned 
above are unknown, so a prototype should aid greatly in this evaluation. In 
this case, the prototype has two functions: Refine the requirements so that a 
full FDAS implementation can be easily built, and test some of the design 


M. Zelkowitz 
U. of M. 

3 of 22 


ORIGINAL PAG! W 

ideas for feasibility. OF POOR QUALITY 

In order to build the prototype, the following general strategy will be 

used: 

(1) A subset of the requirements for FOAS will be written. 

(2) A prototype will he built to these requirements. 

(3) The prototype witl be instrumented to collect usage and performance 

data. 

(A) S.E.L. project data will be collected. 

(5) The prototype wilt be evaluated. 

(6) Features that are not effective will be redesigned. 

(7) The full FDAS system will be built. 

(8) The effectiveness of the prototype on the final product will be 
evaluated. Was FDAS cheaper to build? Will it be more reliable? Will it be 
more efficient? Will it have a better man/machine Interface? 

This evaluation will be by automated probe.s into the system. A logging 
file is being created for each user command. Execution characteristics will be 
added to this file as the prototype executes. A feature in the prototype to 
altow the user full range of changes to the software will be measured to see 
how often the experimenter must go "outside" of the commands provided by FDAS. 
This should greatly help in the user Interface. 

It is still too early in the development cycle of FDAS to give any 
conclusions. However, the project is moving along and a prototype should be 
ready for evaluation sometime midway into 1983. This should prove useful in 
addng to our knowledge about this important concept. 


ACKNOWLEDGEMENT 

This paper was supported by NASA grant NSG-5123 to the University of 
Maryland. 


REFERENCES 

(SEL82) Software Engineering Laboratory, Collected Papers - Volume 1, 1982. 

CSEN80) ACM SIGSOFT Software Engineering Notes, Rancho Sante Fe Workshop, 
October, 1980 

(SEN82) ACM SIGSOFT Software Engineering Notes, 2nd Software Engineering Sym- 
posium: Rapid Prototyping, December, 19821 


M. Zelkowitz 
U. of M. 

4 of 22 


Till* VII WGRAI’II MATERIALS 
for the 

M. /I LKOWITZ prhsfntation follow 


M. Zelkowitz 
U. of M. 

5 of 22 



JARGON 


STRUCTURED PROGRAMMING 
SOFTWARE ENGINEERING 
DISTRIBUTED PROCESSING 
DATA BASE 

PROTOTYPING 


M. Zclkowit? 
V. of M. 

6 of 22 



PROTOTYPE 


- THE ORIGINAL OR MODEL ON WHICH SOMETHING 
IS BASED OR FORMED 

- SOMEONE OR SOMETHING THAT SERVES AS AN 
EXAMPLE OF ITS KIND 

IN LOOKING AT SEVERAL COMPUTER GLOSSARIES UP THROUGH 1981, 
NO MENTION IS MADE OF PROTOTYPE. 


USED IN: 

1979 RANCHO SANTE FE WORKSHOP 

198? ACM SIGSOFT RAPID PROTOTYPING WORKSHOP 

RECENT DOO REPORTS 

SEVERAL TMESES STARTING TO APPEAR ON TOPIC 


M. Zolkowitz 
U. of M. 

7 of 22 


A PROTOTYPE IS NOT A MODEL 

- A MODEL USUALLY INVOLVES LOOKING AT 
ONLY A FEW CHARACTERISTICS 

- A SIMULATION IS USUALLY A MODEL AND NOT A 
PROTOTYPE 

- THE PROTOTYPE NEEDS TO DEMONSTRATE MOST OF 
THE BEHAVIOR OF THE FINAL PRODUCT 


M. Zelkowitz 
U. of M. 

8 of 22 



WHAT IS A PROTOTYPE? 


WHAT ARE THE GOALS FOR A PROTOTYPE? 

WHAT ISSUES ARE INVOLVED? 

HOW DOES IT FIT INTO THE SOFTWARE LIFE CYCLE? 
HOW DO YOU USE PROTOTYPES? 

WHAT TOOLS CAN BE USED TO: 

DESIGN PROTOTYPES? 

BUILD PROTOTYPES? 

EVALUATE PROTOTYPES? 

DOES IT FIT INTO THE NASA ENVIRONMENT? 


M. Zelkowitz 
U. of M. 

9 of 22 



WHAT IS A PROTOTYPE? 


- "QUICK AND DIRTY" "THROW 
AWAY" FOR EVALUATION 

- SUBSET IMPLEMENTATION 

- HOW DIFFERS FROM "INCRE- 
MENTAL DEVELOPMENT?" 

LANGUAGE LEVEL? 

- "LOW" (FORTRAN, PL/I, PASCAL) 

- "HIGH" (SETL, SNOBOL4) 

- "VERY HIGH" (SPECIFICATION 
LANGUAGES-GIST) 


M. Zelkowitz 
U. of M. 

10 of 22 



NOW PROTOTYPING A 


RESEARCH ISSUE 

- PROTOTYPE BY SUPERCODERS 

- DEVELOPMENT BY MERE 
MORTALS 

1. WHAT EFFECT ON DEVELOP- 
MENT OF TECHNIQUES? 

2. WHAT WILL HAPPEN WHEN 
MERE MOTALS START TO 
PROTOTYPE? 


M. Zolkowitz 
U. of M. 

11 of 22 



IS NOT REALLY ADDRESSED 

YET- MEASUREMENT 

- PROTOTYPE USED FOR 
EVALUATION, BUT HOW 
EVALUATED? 

- USER "SATISFACTION", "USER 
FRIENDLY" 

- PERFORMANCE 

- COSTS 

- NEED MODELS OF PROTO- 
TYPING AND PROBES CAN 
BE ADDED TO PROJECTS TO 
PERFORM EVALUATION 


M. Zelkowitz 
U. of M. 

12 of 22 



AREAS OF DISCUSSION 


® PROFILES 
9 MODELS 


• METHODOLOGIES 



PROFILES 


- LACK OF KNOWLEDGE AROUT CHARACTERISTICS OF 
A PROTOTYPE 

- WHAT IS REASONABLE COST RELATIVE TO FULL 
DEVELOPMENT? 

- WHAT LEVEL OF RELIABILITY SHOULD RE 
ACHIEVED? 

- WHAT LEVEL OF FUNCTIONALITY IS DESIRED? 

NEED TO COLLECT DATA TO CHARACTERIZE THIS TYPE OF DEVELOPMENT 


M. Zelkowitz 
U. of M. 

14 of 22 



MODELS 


- LIFE CYCLE MODELS 

- ERROR MODELS 

- COST MOOELS 

NEED TO COLLECT DATA TO GENERATE VARIOUS MODELS 
AND TEST EXISTING MODELS ON PROTOTYPES 


M. Zclkowitz 
U. of M. 

15 of 22 


METHODOLOGIES 


- HOW TO BUILD A PROTOTYPE 

- HOW TO EVALUATE A PROTOTYPE 

- HOW TO USE PROTOTYPE TO BUILD 
FULL IMPLEMENTATION 


M. Zclkowitz 
U. of M. 

16 of 22 



ORIGINAL PAGE 18 
OF POOR QUALITY 

FLIGHT DYNAMICS ANALYSIS SYSTEM 

CURRENT METHOD: fE.R., TO TEST NEW ORBIT CALCULATIONS* : 

- ACCESS FORTRAN SOURCE LIBRARY 

• MODIFY RRORER SUBROUTINE 

• RECOMPILE AND BUILD NEW LOAD MDOULE 

- TEST NEV ALGORITHM 

- RUN EXPERIMENT 

—I THUS NEED DETAILED KNOWLEDGE OF SYSTEM 

FDAS: 

- ENTER FDAS 

- FDASACCESSES DATA BASF AND ASKS FOR TASK 

- EXPERIMENTER SPECIFIES CHANGE 

- FDAS RECOMPILES FORTRAN SOURCE AND BUILDS NEW LOAD MODULE 

- RIJN EXPERIMENT 

— I LESS OETAILED KNOWLEDGE NEEDED OF SOURCE PROGRAM 
AND LESS TIME NEEDED TO RUN EXPERIMENT 


M. Zelkowitz 
U. of M. 

17 of 22 



FACTORS IN SOFTWARE DEVELOPMENT 

GROUND SUPPORT SOFTWARE 


REQUIREMENTS KNOWN 
SIZE KNOWN 
EXECUTION CHARACTERISTICS KNOWN 
USER INTERFACE KNOWN 
ALGORITHM DESIGN KNOWN 
COST KNOWN 


M. Zelkowitz 
U. of M. 

18 of 22 


FACTORS IN SOFTWARE DEVELOPMENT 

NEW DEVELOPMENT 


REQUIREMENTS ? 
SIZE ? 
EXECUTION CHARACTERISTICS ? 
USER INTERFACE *» 
ALGORITHM OESISN ? 
COST ? 


M, Zclkowitz 
U. of M. 

19 of 22 


PROTOTYPE STRATEGY 


- DEFINE A SUBSET OF THE REQUIREMENTS OF A NEW DEVELOPMENT 

- BUILD A PROTOTYPE TO THESE REQUIREMENTS 

- INSTRUMENT THE PROTOTYPE TO COLLECT USAGE AND PERFORMANCE 
DATA 

- COLLECT S.E.L. PROJECT DATA 

- EVALUATE PROTOTYPE 

- REDESIGN FEATURES THAT DO NOT MEET SPECIFICATIONS 

- BUILD FULL IMPLEMENTATION 

- EVALUATE EFFECTIVENESS OF PROTOTYPING ON FINAL PRODUCT: 

- CHEAPER? 

- RELIABILITY? 

- EFFICIENCY? 

- MAN/MACHINE INTERFACE? 


M. Zelkowitz 
U. of M. 

20 of 22 



AUTOMATED "’ORES 


USAGE OF FEATURES 
TIMING DATA 
ERROR COUNTS 

HOW OFTEN PROTOTYPE IS BYPASSED 


M. Zelkowitz 
U. of M. 



CONCLUSIONS 


GENERATE PROFILE OF PROTOTYPE DEVELOPMENT 
IS IT SUCCESSFUL IN NASA ENV’WMENT? 

COME BACK NEXT YEAR!!! 


M. Zelkowitz 
U. of M. 

22 of 22 



-Ti 

I 

^N83 32361 


PANEL #2 


SOFTWARE TOOLS 


J. Goguen/K. Levitt, SRI 
I. Miyamoto, University of Maryland 
P. Szulewski, Draper Labs 


ORIGINAL PAGE JS 
OF POOR QUA! ITY 


EXPERIENCES AND PERSPECTIVES WITH SRI'S TOOLS 
FOR SOFTWARE DESIGN AND VALIDATION 

by Joseph Coguen and Karl N. Levitt 
Computer Science Laboratory 
SRI International 
Menlo Park, CA 94026 

For the past 10 years SRI has had a Major research program concerned with 
program specification, design and verification. The product of this work 
has been an evolving Methodology supported by specification languages and 
tools for reasoning about specifications. Among the most important tools 
are: syntax and type checkers; semantic checkers and theorem provers; 
interpreters for processing test .data; and analyzers for proving particular 
properties of specifications (e.g., the absence of security violations). To 
evaluate this methodology, we have undertaken successful large scale 
applications to both fault tolerant computing and to secure computing. Our 
research is now evolving to an environment that can support the entire 
programming lifecycle. Among tools now under construction for this more 
comprehensive methodology are structured editors, pretty printers, progran 
libraries, and program testing systems. We are also considering the use of 
graphics, e.g., pictures to display important properties of systems. This 
paper briefly describes the current methodology, with emphasis on the role 
of specifications in the design process, and presents our experience (and 
that of others who have used the techniques) on several significant 
projects. 

We have found it useful to consider a spectrum of different specification 
languages, each most suitable for a different purpose. A major purpose of a 
specification language is to support the decomposition and testing of 
designs at an early stage, so as to forestall unnecessary effort at later 
stages. Sometimes it is only necessary to obtain a prototype system which 
demonstrates the feasability of some concept; in such a case, it would be 
desirable to directly execute its specification. In other cases, one wants 
to be able to easily verify some particular but subtle property of a system, 
such as its ability to recover from certain classes of fs\lts; then one 
might want to structure the design to facilitate the proof. In other cases, 
one might want to use specifications for documentation, and thus maximize 
their understandability and flexibility. In still other cases, one might 
want to be able to change easily from one design to another closely related 
one for a slightly different application or context. The languages and 
environments that are best for one of these purposes will cot necessary be 
the best for another, and we have found that there are interesting trade 
offs, for example, between the expressive power of a specification language 
and its intuitive simplicity. 

Although cot denying the utility of specifications, designers have in 
general been reluctant to write formal specifications. Perhaps the most 
compelling reasons for this have been the absence of a good specification 
language with tool support and the absence of examples that can serve as a 


K. Levitt 
SRI 

1 of 23 



ORIGINAL PAGE IS 
OF POOR QUALITY 


■odel of a "food” specification; a specification with too auch detail is not 
worth the effort. Consequently, foraal aethods have only been seriously 
attempted for those systems where reliablity is vital. We see these aethods 
as now becoming ready for a broader class of systeas. 

In support of our efforts, we are developing tools that include the following: 
the STP theorea prover and its associated Design Verification Systea 
(developed by Schwartz, Shostak, and Mel liar-Smith); PHIL, a Beta-programmable 
context sensitive structured editor (developed by Goguen and Lamport); 


Pegasus, a systea for support of graphical programming; and OBJ, an ultra 
high level programming language based on rewrite rules and abstract data 
types (developed by Goguen). We are also doing some related work onacquir- 
Ing and expressing requirements, and on perforaance analysis. 

We have bad particular success with the specification and verification 
of two classes of systems for which reliability is vital: 
fault-tolerant systems for aircraft control and secure operating 
systems. For the former, we have developed a fault-tolerant computer 
called SIFT (Software Implemented Fault-Tolerance), and have verified 
that it is correct with respect to a reliability model. Several 
subtle bugs in our original ioftware were uncovered in the process of 
specification and verification. The aost significant was that the 
results of infrequently executed, tasks were not voted on sufficiently 
often and, hence, were not adequately protected against faults. 

For the secure systems work, we (in cooperation witk Honeywell Systems and 
Research, Ford Aerospace, and several other companies) have worked on 
several secure operating systems, ranging from small guards and kernels to a 
full, general purpose operating system (PSOS — Provably Secure Operating 
System). For PSOS, in particular, the salutary effects of prodqcing formal 
specifications were: 

- A clean decomposition of the system into modules that are 

largely independent 

- Minimization of the total number of modules through the 

the identification of multipurpose, parameterized 
modules 

- A clean user interface 

- A portable design in that each level in the hierarchy provides 

an interface independent of how it is implemented 

- Identification of easily-formulated properties that were used 

as the basis in proving a design to be secure. 


K. Levitt 
SRI 

2 of 23 



THE VIEWGRAPH MATERIALS 
for the 

J. GOGUEN/K. LEVITT PRESENTATION FOLLOW 


K. Levitt 
SRI 

3 of 23 


WORK AT SRI INTERNATIONAL ON SOFTWARE 
SPECIFICATION AND REQUIREMENTS 


JOSEPH GOGUEN 
KARL N. LEVITT 

COMPUTER SCIENCE LABORATORY 
SRI INTERNATIONAL 
MENLO PARK, CA 


K. Levitt 

SRI 

4 of 23 



OUR MESSAGE 


A "new" paradigm for software 
development is gaining acceptance 


FORMAL (i.e., precise) REQUIREMENTS 
and SPECIFICATIONS are now possible 
for most systems 

Experimental languages and tools for 
analyzing specifications and 
requirements are available, e.g, SRI’s 
Hierarchical Development Methodology 
(HDM) and specification languages 
SPECIAL and OBJ 

Experiences with these techniques 
have been positive 

* SIFT (Software Implemented 

Fault Tolerance) ultrareliable 
flight-control computer 

* PSOS (Provably Secure 

Operating System) 

These techniques give promise of 
reducing lifecycle cost 

K. Levitt 
SRI 

5 of 23 



PREFERRED APPROACH TO SOFTWARE 
DEVELOPMENT 

> Requirments 


V 

< — > 1st Design 

Prototypes 

V 

< — > 2nd Design 


Production 

| Systems 

< — Implementation 


K. Levitt 
SRI 

6 of 23 


ACTIVITIES AT EACH STAGE 


Formal specification — Supported m 

functional behavior HDM and OBJ 

Verification of specs Supported 

in HDM 

Testing of executable Supported 

specs — with real in OBJ 

and symbolic data 

Interstage consistency Supported 

(including design in HDM 

and code 
verif ication) 

Pictorial descriptions In progress 
of specs and code 


K. Levitt 
SRI 

7 of 23 



APPROACHES TO INTERSTAGE REFINEMENT 


- Vertical refinement — Hierarchical 

decomposition using Abstract Data 
Types 

- Horizontal refinement — Building a 

module out of existing modules 

- Program transformation — Improving 

the performance of a program while 
preserving its functional behavior 


i 

K Levitt 
SRI 

8 of 23 


WHAT IS A SPECIFICATION 


A specification is the DEFINING statement 
of a system’s BEHAVIOR 

It should resolve UNAMBIGUOUSLY questions 
about how the system should resolve 
in ANY situation 


— > 
Inputs 


System 


— > 

Outputs 


A spec is a BLACK-BOX Description 
UNAMBIGUOUS => specs are FORMAL 


K. Levitt 
SRI 

9 of 23 



QUALITIES OF A "GOOD" SPECIFICATION 


-- Concise 

- Easy to produce (compared with an 

implementation) 

- Readable 

- Executable (in support of testing) 

- Support automated reasoning 

(e.g., verification) 

- Allow for performance analysis and/or 

simulation 


K. Levitt 

SRI 

10 of 23 



FEATURES OF A SPECIFICATION LANGUAGE 


- Allow specification just in terms of 
"callable" functions. E.g., a "file" 

system is definable in terms of 

CreateFile, OpenFile, CloseFile, 
WriteFile, ReadFile, MovePointer 

An OBJ specification consists of 
equations e.g., 

ReadFile (WriteFile (CreateFile () , val) 

= val 

- Allow specification in terms of 

abstract (i.e., high-level 
data structures 

An HDM specification would represent 
a "file" in terms of a semi-infinite 
array (FileVal) and a pointer 
(FilePointer) 

WriteFile (val) 

EXCEPTION: FileFull 
EFFECTS : 

’FilePointer = FilePointer + 1 
’FileVal ( ’FilePointer ()) = val 

K. Levitt 
SRI 

11 of 23 


FEATURES (cont.) 


PARAMETERIZATION, i.e., using 
a library of previous developed 
specifications 

a "secure" file could be specified as 
SecureFile (Contents , SecurityLevel) 
where : 

Contents is any type 

SecurityLevel is "Partially Ordered 

Set" 


Logical and Set statements (including 
infinite sets) 

Finding an element val in a file: 
EXISTS i : FileVal(i) = val 

Number of appearances of element 
val in file: 

CARDINALITY ({ i | FileVal(i) = val }) 


K. Levitt 

SRI 

12 of 23 



ORIGINAL PAGE IS 
OF POOR QUALITY 

TOOLS IN SUPPORT OF HDM AND SPECIAL 
specs I 

I 

I 


I Syntax and I 
I Type I 
I Checker I 


I 


I General I I Security I I Code I 

i Design I I Verifier I I Verifier! 

jVerif ier(l) I I (2) II (3) I 


Notes : 

1. Verifies properties of spec, e.g, "File will 

never overflow" 

2. Checks for information flows in violation with 

Multi-Level Security Model 

3. Languages supported: Pascal, Jovial, Fortran 77 


K. Levitt 
SRI 

13 of 23 



TOOLS IN SUPPORT OF OBJ 


Syntax 

Checker 


specs 



Editor 


\ 

\ 


/ Test 
/ Cases 

/ 


Interpreter 


K. Levitt 

SRI 

14 of 23 



REFINEMENTS FOR SIFT ULTRARELIABLE COMPUTER 


I/O Modal 

I 

I 

Replication 

Model 

I 

I 

Activity 

Model 

I 

I 

Operating 

System 

I 

I 

Pascal 

Programs 

I 

I 

BDX-930 Code 


System SAFE => 

"all tasks correct" 

Task replicated; 

Values voted on each 
execution of tasks 

Task activties: startup, 
broadcast of values, vote 
execute, synchronization 

SPECIAL specs for OS routines: 
scheduler, voter, dispatcher 
buffer manager, etc. 

Code for each routine 


K. Levitt 
SRI 

15 of 23 



EXPERIENCE WITH SRI's FORMAL TECHNIQUES 


Organization System Specs Design Code 





Proof Proof 

SRI 

SIFT 

X 

X X 


PSOS 

X 

X 


Real-time 

X 



OS 



Ford 

KSOS-11 

X 

X 

Aerospace 




Honeywell 

SCOMP 

X 

X 

Sytek 

SACDIN 

X 

X 

Merdan 

Secure 
msg system 

X 

X 


K. Levitt 
SRI 

16 of 23 



ORIGINAL PAGE fS 

OF POOR QUALITY 


P505 DESIGN HIERARCHY 

• _ _ 
» — — — — — — — — — ~ ~ — — — — — — — 

J LEVEL! PSOS ABSTRACTION OR FUNCTION 


16 
15 
1R 
13 
12 
1 1 
10 
9 
0 
7 
6 
5 
R 

3 

2 

1 

0 


USER REQUEST INTERPRETER • 

USER ENVIRONMENTS AND NAME SPACES ■ 

USER INPUT-OUTPUT ■ 

PROCEDURE RECORDS « 

USER PROCESSES « AND ViSIULE INPUT-OUTPUT 
C\'f«-.ri KIN AND DELETION OF USER OIJJECTS • 
IS; * OR IKS t ";[ C 1 1 ] 

I* X t ..iVDED TYPES ( » ) [ C 1 I | 

SEGMENTATION AND WINDOWS C)[CI1J 
PAGING [fl] 

SYSTEM PROCESSES AND INPUT-OUTPUT [ 12 J 
PRIMITIVE INPUT/ OUTPUT [6] 

ARITHMETIC AND OTHER UASIC OPERATIONS • 
CLOCKS fO] 

INTERRUPTS [6J 

REGISTERS (*) AND ADDRESSABLE MEMORY (7J 
CAPABILITIES » 




! » = MODULE FUNCTIONS VISIBLE AT USER INTERFACE. 

! (•) = MODULE PARTIALLY VISIBLE AT USER INTERFACE. 

J [ I J = MODULE HIDDEN BY LEVEL I. 

J CC11) = CREATION/DELETION ONLY HIDDEN BY LEVEL It. 


K. Levitt 
SRI 

17 of 23 



S/W Eng Methodology 


p/wr r *; 

OF -WOP f ’ • " • v 


PROTOTYPE 



PROTOTYPING: Feedback to user is a fuzzy concept 

EXAMPLE. Use of scenerios 

ALSO need feedback to the designer/coder 
e.g., performance models 

TOPICS: Early in process 

This roughly corresponds to levei of abstraction 


K. Levitt 
SRI 

18 of 23 



GENERAL MOTIVATION 


ORfGfJV'AL' 

OF POOR 

TO provide a precise scientific way 
TO discover 

a) What users want or need 
(requirements) 

b) What “linguistic structures”* work best for a given purpose 
(user interface design) 

c ) What is really going on in a given social context 
(social system analysis) 


PAO's f 3 

Quality 


* may be graphical, textual, speech, or mixed media; all are “linguistic” in the sense of 
being hierarchically structured into atoms, phrases, and discourse units. 


K. Levitt 
SRI 

19 of 23 



REQUIREMENTS 


Two major components; 

1. How the client will use the system. 

information flow at the interface, inside the system, and in the client's 
organization. 

2. Client’s criteria for evaluation of the system. 

a hierarchy of values; may be subjective factors and organizational 
factors, as well as objective and individual factors. 

These lead to two representation systems. 

1 . Abstract Data Flow Diagrams 

2. Value System Trees 
Note that both are graphical in nature. 


K. Levitt 

SRI 

20 of 23 


ABSTRACT INFORMATION FLOW 

A. motivation 

We want to characterize information by its u$SL and intention (social meaning), 
not by its physical representation, 
vs. operations research 

This can be done if we look at the information from the viewpoint of those who 
use it. 

Such information is available in the users' language. 

B. DATA FLOW DIAGR AMS 

Graphs, with “files," -nich represent some type of data, generally structured; and 
“actions,” which are operations on that data. 

We can have both iteration and recursion in DFDs. 

Also hierarchical structure. 

C. ABSTRACT DATA FLOW DIAGRAMS 

“Abstract” means independent of representation data characterized by relations 
among op’s on it. 


K. Levitt 
SRI 

21 of 23 



ORIGINAL RAGE IS 
« POOR QUALITY 





ABSTRACT DATA FLOW DIAGRAM 


"abstract car flow processor" 
can be compiled into a simulation 


K. Levitt 
SRI 

22 of 23 



81 ZE 



REGULATE TRAFFIC FLOW 
' / 

SERVICE SAFETY 


FUNCTIONAL 
REDS 
Mt ADFD 


RELIABLE FAIR EFFICIENT 


VALUE SYSTEM TREE 


Can be used to organize: 

Management effort 
Organizational structure 
Accounting 

Structured walkthroughs 
Acceptance tests 
Redesign criteria 

Natural visualization 

Can be used to compile tools for later phases. 


O 

c - > 


K. Levitt 
SRI 

23 of 23 



ORIGINAL PAGE 18 
OF POOR QUALITY 


USER INTERFACE DESIGN OF 30FTVARE TOOL SYSTEM AS 
A TECHNOLOGY TRANSFER VEHICLE 

. N 8 3 32362 

I * ao M; v am: to ' 

Desartment of Mathematics i nd Computer Science 
University of Maryland. Baltimore County 
Catcnsville, MO 21221 


ABSTRACT 

The paper introduces design oon» : de ra t i on* of an on-going research 
otOict for developing an effective and #a«y-to-uaa tool system that 
supports antlr* maintenance phase* Tha primary foeua la tha design 
of in “ tntal 1 igent" uaar intarfaea maehaniam. By analysing why mating 
tools and tool ayatama ara not uaad vary affactivaly, wa san daftna 
uaara requirements for tha. uaar intarfaea machaniama. apaeify daatgn 
criteria of user interface functions, and introduce acme features of 
tha im: 1 ementat i on Because this protect is still in procaaa, intermediate 
•valuation ind expected ef f ect iveness ara discussed Tha author believes 
that only a wa 1 1 -dea t pned tool system can be a powerful software 
engineering technology transfer vehicle 



1 INTRODUCTION 


The role ofthesof tware syat am is extremely important in a computer-base:! system. 
The technology to develop and maintain quality software is the key to the 
advancement of computer applications; such technology is called software 
engineer ing 

Ve hive surveys'! current technigues, methodologies, and tools (or tool systems) 
for producing high quality software C i 3 . The most serious finding is that 
although many techniques, me t hedo I oq i ea , and software tools are available, 
they are net used very much or very effectively in real software production 
environments 123 Sometimes, programmers do not know what items are available 
or how to use them. Sometime, their productivity and quality of their software 
fail to improve anyway Later we will discuss some reasons for the failures 

The author has experience in the development of a large-scale integrated tool 
s/stem. This project was carri d out in the author’s former company from 1976 
to 1979 We tried to develop a software support system named Software 
Development & Maintenance Support System (SDMSSH63 that wa* supposed to cover 
the entire software life cycle. Although we had developed some parts of the 
system, I frankly think we failed to develop an easy-to-use and effective tool 
system. Ve did not spend enough time designing the framework of the system, 
such is maintainability, portability, database, command language, graphics 
caoability, etc Ve simply tried to integrate many attractive ideas We 
recuired a very large host computer, much programming effort, many resources 
to execute this system, etc We did not have any clear methodology for using 
all of the functions of the system. Ve learned many lessons from the failure 
of this project I. Miyamoto 

U. of M. 

1 of 12 



•jklGINAL HACK ft; 

OF POOR QUALITY 


In - d d i ! ; ' n t j t h i c experience the author hit promoted modern so It wa r e 
# n ij i n* * r i n g tethnique* an the software industries Thrcuah thi* type 0 f 
professional dtvelcpment. the author made valuable landing* about ah# tai'JH 
0 f • ethnology transfer To summarize, transfer of technology : r very 
difficult if w# lick tools that realise md support the propos'd methodology 
or t * c h ru a u e 

From those exper 1 tnce* , we discovered why it ;* difficn' 1 to tranifer software 
engineering technology from th* rtseerch environment to tht production 
environment The production environment is m great need of these new ideas 
Vc also realize why existing individual *ocl* and tool systems are not used 
verv much or very « f f ec t l vel y althouqhthey were developed to be used frequently 
Some of the problems com# from management, some come irom human factors, and 
manv ire associated with th* fool or * 00 1 system itself 

However, manv of those reasons may be integrated as a "technology transfer 
problem " We would like to introduce some ideas for the transfer of software 
encineering tools 


l 1 Why software tools art not used 

Our survev C3J and some other surveys C 4 , 5 ] indicate that we have many 

individual tools and several tool systems However, almost none of these is 

used effectively 

For individual tools, some of the major reasons are as follows 

1' Most tools do not have a clearly defined methodology, and only the program 
code is available Rarely is a user's guide available 

2' Most tools have not verified their economic e f f ec 1 1 venee ■ . 

1) Because of the difficulty in defining criteria for evaluating the quality 
and ef f ectiveneas of so'tware tools, manv tools have not been teeted by 
users 

Manv tools are not evaluated at all 

5' Some tools have been evaluated, but they are clearlv not cost-effective. 

6 1 It is v«rv hard to use or describe s one tools 

* > Documentation (user's manual design specification, maintenance 

manual, etc.' is poor Sometimes there is no available documentation. 

8 ' Tools assume manv predetermined environmental conditions which are not 
documented Most of the time, these conditions do not match the real 
conditions of the users 


I. Miyamoto 
U. of M. 

2 of 12 


*55 5&S 

’ ' Usability of toe.# if very Door btcaust of lack of propsr msthodelogy 

\ 0 ' Somstime* the tool ltsslf doe* not pro&erly fuoport user activity 
because of pod: unde r • t and i ng of software production proc: r « model* 

li' The reliability of the algorithm*, the Quality of the imp i eaent i on , and 
the efficiency of the tool are not cuffleient for the user. 

12' Many individual tool* are not designed to have common input/output 
f o rraa t s 

13) User* strongly resist tools that were designed at other organizations 
H> Special-purpose tools service a very small audience. 

15) Programmers generally resist new or foreign languages and tools. Expert 
programmer* are the most resistant, as they are the most conservative 

16) If the development group has a bad reputation, most programmers do not 
want to us* the products. 

17' Sometimes, the particular tools have a bad reputation. 

18) Sometimes tools do not fit the existing working criteria 

1?) Many .ools do not have ex t ends b l i 1 t y or modifiability to accommodate 
each user's environment 

20' The maintenance of the tool itself has not been taken into acoount properly 
and the quality and functionality of the tool become ineffective over time 

21) The portability of the tools is very poor 

Several points represent the problem of designing our tools to improve the 
situation For txamole. reasons 1, 6, 7, 8, 9, 12, 19, 20, and 21, all depend 
on design or on support methodology to apply a tool's capabilities to the user’s 
proper production activites. 


I. Miyamoto 
U. of M, 

3 of 12 


ORIGINAL PAGE IS 
OF POOR QUALITY 


1 : Uh'f •::! *v* teas are not used 


Too! ivi'titi ire collections cf many individual tooi« Thers it# two tvpss o t 
too; f " f t tis he t • r ogen* ous and homogeneous 

Tht first tycs of tool system integrity* different types of tools tnd supports 
n: common methodology for using the component tools The second type tlsc 
m teg rites individual tools but supports some common methodology for usinq 
the component tools UNIX is representative of the first type, and SDHSS 
if representative of the second type i e 3 

Each type has both merits and demerits, neither is a perfect tool system. 
Existing tool systems hav* the following major problems 


l> In general, tool systems have the potential to be bigger and bigger 
To create and us* a tool system requires a large memory space, many 
computer resources, a large database, a large-scale computer, sophisticated 
terminal devices, etc. 

2' A oarticular tool system is very expensive to us* 

The development cost of a tool system itself is extremely high 

4' Components of tool system are tightly integrated and so add’.ng or deleting 
tool functions is cuite difficult. 

5) The maintenance of a tool system itself is tremendously expensive, 
in fact, sometimes it is impossible to maintain 

6' The input and output of the components are not uniform. 

7 ' Mar.v user interfaces of tool syst ems depend on the host operating 
system, and they are not easy to use. 

8' Because many functions depend on the specific hardware or operating system, 
the oortability of the tool system to other environments is very poor 

? Very few tool systems are designed to support both expert programmers 
and novices 

101 Most tool systems are not desiqned to support groups of users 

in Few tool systems are graphics-oriented, and so many users must use text 
type information 


12' Most tool systems d? not hav* any global-level methodology, and are 
just a collection of individual tools. 

Sometimes, tool systems enforce a very biased (eg. improper, end always 
same) methodology to users 

13' Management of the activities don* through the tool systems is not available 

11’ It is difficult to cover the entire software life cycle because of the 
current lew el jf sophistication of software technology 

I. Miyamoto 
U. of M. 

4 of 12 



ORIGINAL PAGE IS 
OF POOR QUAL»TY 

A 'M system ha* many problems beyond these et tools Th i * it whv verv tew 
txiiMng or ‘wM *v*t*i»* are used very much or ««iv effectively. 

In order to in real* software uoductivity and software 4 u * 1 l t v 111 w* nuit 
dtfign md use support tccls i er Ucl systems' that aid our software 
development and maintenance activities it least 

Therefor# a quest ton w# mutt answer 1* how to design elleetiu# md easy-to- 
ui* t >o 1 * or tool » v • t *m« 

A tool or tool ivft*n cm be t verv powerful medium that tnniliri software 
engineering methodology from researcher* to gractioner* in th* rail world 
Carefully designed tool ivitim cm very effe ctively Umiltr technology 
V# believe that tool deugn 1* onlv on* strong mechanism to lid Uiniltt 
cl existing technology V# 1I10 believe that tool* must be easy-to-use md 
co 1 1 -# I I * c 1 1 v * to mi.k* people ipplv new software engineering technology 

: PROBLEMS 

V* ite d**;gning 1 rather tmbitiou* tool system to luppott eottwire miintenmc* 
ictsvit*** which 1* called Pundora’s Box t ’ 1 The tool# at# available 
individual!.’ now Taking into account previou* ma ’ 0 r problems. w* have 
'artfully defined our design criteria 

Concerning th# big tcile of tool *v#t#m# vFroblem * l ' the Pandora’* Bo* 
i* designed to have levtral *ub*y»teme which are independently executable 
Entire whole functionalitie* of tool *y*tea are going to be very large but each 
component 1* de«ioned to be very compact and to be executable on a eupet micro- 
computer Therefore the usaga co*t 1 1 expected to be very low (Problem •»' 

These component# and fane 10ml 1 tiei are designed to u#e available tool 
function* e g full use .•( UNIX environment' Then we will avoid wasting 
much non* v indicating tho** function* Th* * t : ; t u : es of subsystem* are t * be 
modular and ill necessary interaction; at* to be done via database (Problem 
• d ’ r hen the svetem *tructure is verv flexible ^ J each function i* designed 
to be rather small to increase saintainabilitv of the toot system itself 
i ?r obi to *5 ' 

Interaction* between loot function* art done by databa*e v .* f software 
knowledge base' and input output format* are common a* in UNIX (Problem tie' 

With UNIX environment a* a host for this tool *v*tem the portability cf the 
*vMem t* assured to some ext*nt (Froblem #?' This tool system 1* graphics- 
oriented Then user* can use the graphic* capabilities of a color graphic* terminal 
and color x-v plotter In th# system grashice are not aecondarv to or 
a substitute for text type command* Th* policy call* lor graphic* first and 
text next vProblem # 11 ' Th# tool *v*t#m is designed to keep all of th# usage 
ht*torv md register individual scenarios Using a scenario system md a 
hierarchical mu mi system w# can manage the user’s activities and collect some 
*» m gemen t d a r 1 x P r o b I • m 113 ' Ve have tried to apply th# latest techniques 
to ’he design of Pandora’s Box and we limit the usage jf th# svstem to certain 
phase* 0/ the software life cycle 

We selected onlv already evaluated techniques md tcol* (Problem • 1 4 ' The 
problems related with user interface and methodology (Problem* I? • » #10. and 

*11' are described precisely in th# next section 


I. Miyamoto 
U, of M. 

5 of 12 



ORIGINAL PAC ? I 
OF POOR QUAU^f 

3 DESIGNING -JSER INTERFACE MECHANISM 

Vt are design 't a *ci tware-ma intsnance support tool #yttim named 'Pandora's 
Sox and would like to introduce some idtai iron this pro’ect Thoee ideae 
art rtla'ei to the user interface deeign ot this tool system Tha ueer 
interface is designed to hava two fundamental function* (or ueer* Ona ta a 
three-level menu hierarchy to aarva different a:anarioa to vanoue type of 
programmers. from expert programmers to no/ie* programmers Another (unction 
ia the know) edge -bate guidance »• chan its for thoaa uaara 

3 i Basic reoui ranant a of uaar interface functiona 

Vt resumed three types of uaara novice uaara, expert users, and frequent 
expert users Etch type of user has different requireaents for the user 
interface functions 

For example novices need iquoted from C*3' 

utmost in clarity and simplicity, 
small number of user commands. 

meaningful commands mot a single letter, and not with complex syntax), 
lucid error messages and help facilities, and 
reinforcement from success 

Novices may want compu t a t -d i r a c t e d mode and system’s "friendliness " 
Infrequent expert users prefer 

simple commands . 

meaningful commands, 

easy to remember operations, and 

o r omp ting 

On the other hand, frequent expert users want 

powerful commands, command strings, user-defined commands, 
minimal number of keystroke*, 

brief messages (with access to detail at request), and 
high rpeed interaction 

ExoeMs demand user control and system's "intelligence " 

In order to satiefy all user levels, how should we proceed’ 

V« might do the following n to expect a 'graceful evolution' of uxerx 
themselves. 2’ to apply 'information hiding' techniques to user interface 
mechanism, t, r 3) to ha v* a h i e r a r ch t ca 1 menu selection eystemwith 
"intelligence" and "individual" scenario In general, a menu selection 
user interface may give us 

little framing 

little memor 1 ration, 

clear structure for user activity, 

ease m designing individual small tool functions, and 
simplicity in software structure 

Eut because of the predetermined entriee in the menu, usage can be somehow 
restrictive In order to design a good menu selection system, we need to 
make a bio development effort U ^o^m' 010 

6 of 12 


ORinw/M p.’.e;.; [3 

3 2 Design cntiru of hierarchical menu svetem OP POOR QUALITY 

By taking i r. 1 3 account the baste requirements of user interface functions and 
referring to tht material t?l and borrowing some ideas, w* havt stt up the 
following design criteria 


apply intelligent user guidance mechanism, 
use snail nuaber of choices per screen, 
consider semantic organisation and give title, 
shew hierarchy by graphic design, 

permit staple back, left, right, up and down traversals in the 
aenu hierarchy, 

use proper combination of colors, 
permit type-ahead, 

put sc st important and frequent choices first. 

bsgin choices with keyword, if possible, and 

require an enter key or use light-pen node consistently 

Sous other considerations are: 

display rate 
response time, 
help/explain facilities, 
short cuts/menu macro, and 
hunan reaction to eolors 

3 3 Some features of user interface 

a hierarchical scenarios 

The scenario hierarchy of Pandora's Bos consists of three levels of menus. 

The top level is so-called "methodology-oriented aenu" lor scenario), and 
this will provide users "how-to-aamtun scenarios" which will guide users 
to do al lof the necessary maintenance activities Those activities include 
the detailed phase plans of each type of maintenance The ccenarios are prepared 
m flexible wav for emergency maintenance, planned maintenance, deferred 
maintenance, and preventive maintenance. The work breakdown structures and 
necessary procedural steps are the elementary source of this level When users 
interact with this scenario, users can get complete guidance as to how to 
maintain users programs and data setswithout precise knowledge about maintenance 
activities The users do not need any written guideline to maintain their 
software, they need only follow. 

The second level menu ts "how- to-*e 1 ec t proper tool functions menu" to do 
necessary action guided from the top level aenu. The elementary information 
of this level is a list of tool functions provided by the Pandora's Box The 
tool system will be expanded to contain all of the functions necessary to do 
all of the maintenance activities from maintenance requirements analysis to 
validation of maintained software The menu at this level is constructed based on 
an activity-tools function! matrix The third level menu contains the 
information about "how-to-use a particular tools function" Thu level gives 
users the exact information about the user commandsto execute a particular tool 
function I. Miyamoto 

U. of M. 

7 of 12 


ORIGINAL PAG* 13 
OF POOR QllAl 'TV 


Novis# Usar 
or a x p a r t u • a r — 
it • Ur tine, ima 
of Bsintaninca 


Inf r aquant 
Export User 


F r aquant 
Export User 


HOV-TQ-MAINTAIN MENU 


KNOVLEDSE BAIE SYSTEM 

( sc 1 1 v 1 1 y-manu ) 

? 





HOV-TO-3ELECT TOOL FUNCTION 
MENU < tools menu) 


3l 


HOW-TO-USE FUNCTION MENU 
<c»mnand monu) 


TOOL BOX 


FIG i Hiarsrehtesi Monu Syotom 


Figure 1 is a representation of this menu hierarchy. Users can access the Pandora's Box in any way from 
the top (in this case, users will be guided smoothly to next level menu and Anally guided to command- 
level menu,) to the lowest level of hierarchy to achieve some particular maintenance activity. The system 
will record the histories of activity proAle use for each user; and so the system can provide the best scenario 
to each user individually when users access the system the next time. Hie system will guide users by scan- 
ning the menu hierarchy up and down. The top level menu provides users with the methodology to main- 
tain software. Users do not need a maintenance guidebook and users manual of the tool system itself any 
more. 

The system will guide users and provide necessary information and functions to do the necessary activities. 


I. Miyamoto 
U. of M. 

8 of 12 





/•>**«*■*»' jfji? !• 

OF KJOK QUALITY 


b. Knowledge-base guidance 

When the tyitem guide* uteri, the system refer* to a knowledge base that contains software error information 
and maintenance pattern information. The knowledge base contains exactly two types of error information; 
one is the general tendency type error occurrence distribution, the other is the error history of each user col* 
lected during their use of Pandora's Box. The latter type of error information is analyzed according to the 
target program and individual user. 

Some basic ideas of the error information collection mechanism within the automated tool system are given in 
tire previous article [6). 



FIG 2 Knowledge Bate System 


I. Miyamoto 
U. of M. 

9 of 12 





original pag*J* 
OF POOR QUALITY 


In maintenance phaet in general, especially in i fits* of corrective 
mt in .•nine* . to t • • t modified programs in an efficient way if scat 
important ind necessary Aa emphasised in CM. that* art rather 
clear relationship# t*tw»*n teatinq techniques and atror typaa 
To datact a par . iculartypaof error wa naad some apaeide technique# 

Ua examine. thata ralat ionahtpa and made up teatinq taohniquaa-arror type 
matrix in th* knowledge baaa system C 7 3 

Referring to error information, wa can gat tha information of tha general 
tendency of arror oceurranoa d t a t r i but i on , and by referring to tha user'a 
individual hiatory, wa may adjust this distribution to an individual uaar- 
onantad ona Baaad on thia knowladga. atthatima whan tha uaar aigna on to 
Pandora's 3o», wa can provida tha optimum individual »a » n t an?.;vo* acanino. 

Th i a tcanario ta biaad on a priorititad aanu ao that tha uaar can eontinua hi a/ 
har moat nacaaaary and affactiva activities 

4 REMARKS 

A taehnology tranafar problaa la not aaay. bacauaa it ia ralatad to tha 
education, training, techniques, methods, aupporttng toola, aanagaaant 
oroanisat ion, and human factors. Alao. wa don't know yat what ahould ba 
tnnafarrad Unless wa know it, wa can't diecuaa how wa ahould do taehnology 
transfer Thia nay cauaa aavara quaationa lika, "what ia areally uaaful 
software engineering technology to ba transferred’" , or "from whom to whom?" 
Sasida dtscuaatona on tha daak, wa muat taka soma approachaa to improve 
tha situation of existing tool usage Ua hope that soma of our idaaa on tha 
design of uaar interface (or tool ayatamasay show soma possible direction 

Finally, "friendly" and "intal l lgant" uaar interface mechanism of well* 
designed t oo 1 sy a t ama cou 1 d ba a powerful technology transfer vehicle 

5 Acknowledgement 

The author would lika to express his special thanks to Ora Victor Basil!. 

3en Shntidarman, Kouichi Kiahida, and to his research associates for their 
advices and s uppo r t 


I. Miyamoto 
U. of M. 

10 of 12 



ORIGINAL PAM II 

of poor quality 

REFERENCES 

(1! I Miyimotc, "High quality softwara production t aehni cuas" ■ T1S Pub 
Co , 1912. Tokyo. Japan 

(21 I Miyamoto, "Managamant of softwara maintananca (No. 5)". bit, Sapt 
1912, Kvontsu Pub., Tokyo, Japan 

(32 1 Miyamoto, "Managamant of aoftwara maintananca (No 4>", bit, Aug 
1912. Kyorltau Pub , Tokyo, Japan 

(•S3 Ralfar Conaultant, "Softwara Tools Diractory" 

(S3 NBS, "Softwara Tools Diractory", Oct. 1910 

(43 I Miyamoto, " Ra liability Evaluation and Managamant for An Entire 
Softwara Lila Cyols", Tha 2nd Softwara Ufa Cyoia Managamant Workshop, 
1971 

(73 2 Miyamoto, at al, "Conoaptual dasign of Pandora’s Bo*", to ba 

appaarad 

(13 I. Miyamoto, "Managamant of softwara maintananca (No. 3), bit, July 
1912, Kyoritsu Pub., Tokyo, Japan 

.(93 Ban Shnaidarman, Laetura Nota of Softwara Enginaanng Saminar, 

Oct 1912, SRA Intarnat ional tnc. 


I. Miyamoto 
U. of M. 

11 of 12 



THE VIBWGRAPH MATERIALS 
for the 

I. MIYAMOTO PRESENTATION WERE INCORPORATED IN THE PAPER. 


I. Miyamoto 
U. of M. 



ORIGINAL PAGE >8 
Of POOR QUALITY 



N83 32363 


DESIGN AIDS FOR REAL-TIME SYSTEMS ( DARTS ) 
Paul A. Szulewski 

The Charles Stark Draper Laboratory, Inc. 
Cambridge, Massachusetts, 02139 


Abstract 

Introduction 

Design-Aids for Real-Time Systems (DARTS) is a toot that assiats in 
defining embedded computer systems through tree -structured graphics, 
military standard documentation support, and various analyses including 
automated Software Science [1] parameter counting and metrics calculation. 
These analyses provide both static and dynamic design quality feedback 
which can potentially aid in producing efficient, high-quality software 
systems. 


DARTS Overview 

DARTS uses a mix of hierarchy, control and communications primitives 
and data structures to represent real-time systems. Requirements are 
expressed as a functional hierarchy and designs as a tree -structured 
hierarchy of communicating processes. This hierarchical structure pro- 
vides two distinct advantages, the system can be viewed at different 
levels of detail as required and changes (e.g. , subtree move and delete) 
can be easily implemented. 

Although developed specifically to represent real-time interactions, 
DARTS can be used co define both real-tine and non-real-time systems. 
Specific real-time capabilities include an ability to represent and model 

(1) interactions between the computer system and external 
sensors and effectors, 

(2) interactions between processors in a distributed 
system design, and 

(3) interrupt processing and the flow-of -control in tmiltl- 
programmed software designs. 


P. Szulewski 
Draper Lab 
1 of 20 


original page is 

OF POOR QUALITY 


Throuqh a friendly, menu-oriented interface, a user can represent a 
system; perform data flow cheeking; generate simulations of the design for 
response time, throughput, and utilization; request, a variety of data 
tables and graphical tree-structured output in various sizes; and 
calculate Software Science complexity measures. 

DARTS User Interface 

DARTS is implemented as a Pt,/I program on an Amdahl 470 V/8. A 

user is presented with a menu-driven, full-screen Interface (21 which 
users with no prior computer background have found easy to learn and use. 
Through this interface, an analyst can build and maintain a library of 
DARTS data bases, generate both graphical and tabular output, and initiate 
various analysis functions. 

DARTS Data Base 

The DARTS data base is hierarchical, with records corresponding to 
each of tire nodes in the DARTS tree. The records contain data pertaining 
to control flow, data flow, and relational information for the nodes in 

the tree. Various attributes can be associated with the nodes of the 
tree. Nodes can have names, input and output variable lists, free text 
descriptors, durations, and actual assignment statements to he executed 
during a simulation. Nodes can also have predicates that determine the 
flow of control at branch points. Durations can be deterministic or can 
be given as random distributions. DARTS processes can be assiqned 

priorities to allow one process to interrupt another. Thus, interrupt 
structures and preemption can be explicitly specified and modeled. 

Data Flow Ch eckin g 

Data flew consistency checking verifies that variables are produced 

before they are referenced and referenced after they are produced. 
Documentation outputs currently consist of a data base listinq, the DARTS 
tree, a data-flow table showinq data producer/consumer relationships for 
the nodes in the tree, data set/use tables, and module tables. These 
graphical and tabular outputs are embedded easily into word-processinq 
files for automatic specification generation. 

P. S/.ulewski 
Draper Lab 
2 of 20 



Automatic Simulation 


OT'fltNAL PAGE IS 

07 HGCf*. QUALITY 


A simulation capability 13] is available to provide estimates of per- 
formance factors, using a simulation language developed at the University 
of Birmingham, the Extended Control and Simulation Language (ECSL). A 
translator automatically converts a DARTS representation into an BCSL 
program. Statistics on performance factors such as response time, down 
time, utilization, and throughput are automatically collected and main- 
tained by the DARTS/ECSL system. These statistics can be displayed in 
histogram formats for analysis. 

Software Quality Metrics 

An experimental metric of software design quality is among the design 
feedback analysis features in DARTS. These metrics, based on Software 
Science [1], are useful in assessing the quality of competing software 
designs as well as being predictors of other software planning parameters 
(e.g., size, effort, project duration, and number of modules). 

Prior research [4,5] has shown that it is possible to identify and 
count Software Science parameters in software design media. Experimental 
data suggests that these metrics correlate with a subjective assessment of 
the criteria they were intended to measure. 


References 


[1] Halstead, M.H., Elements of Software Science , Elsevier North-Holland, 
Inc., New York, 1977. 

[2] "Design-Aide for Real-Time Systems (DARTS): Users Guide," Version 3, 
CSDL-C-5441 , The Charles Stark Draper Laboratory, Inc., January 12, 
1982. 

[3] Purtek, P.C., DeWolf, J.B., and Buchan, P. , "DARTS: A Tool for 

Specification and Simulation of Real-Time Systems," Proceedings of 
the AIAA Computers in Aerospace III Conference , October 1981. 

[4] Szulewski, P.A., Whitworth, M.H. , Buchan, P. , DeWolf, J.B., Q uality 
Assurance Guidelines and Quality Metrics for Embedded Real-Time 
Software Designs, CSDL-R-1376, The Charles Stark Draper Laboratory, 
Inc., May 1980. 

[5] Szulewski, P.A., Whitworth, M.H., Buchan, P., DeWolf, J.B. , "The 
Measurement of Software Science Parameters in Software Designs," ACM 
SIGMETRICS Performance Evaluation Review, Vol. 10, No. 1, Spring 

1 981 • P. Szulewski 

Draper Lab 
3 of 20 


THE VIEWGRAPH MATERIALS 
for the 

P. SZULEWSKI PRESENTATION FOLLOW 


P. Szulewski 
Draper Lib 
4 of 20 



P. Szulewski 
Draper Lab 
5 of 20 


The Charles Stark Draper Laboratory, Inc. 

Cambridge, Massachusetts 02133 


W ARTS 

DESIGN-AIDS FOR REAL-TIME SYSTEMS 

by 

Paul A. Szulewski 


o o 
^ 3 

*D O 
O 5 

O s 

-0 r- 

O TJ 

9 > 
> a 
£T m 


Presented at the 

Seventh Annual Software Engineering Workshop 

December 1, 1982 

Goddard Space Flight Center 
Greenbelt, Maryland 

821 1C376-1 



DARTS OVERVIEW 


What is DARTS? 

— An automated tool for the specification, simulation, and 
analysis of distributed, real-time systems 

What is its underlying model? 

— Hierarchical structure 

— Process oriented 

What features aid the designer? 

— Documentation in a variety of formats 

— Explicit control flow and data flow 

— Automatic simulation 

— Automatic software quality analysis 

What features aid management? 

— Concise and understandable documentation 

— Computerized data base 


Draper Lab 


PROBLEM 


• Defining requirements and preliminary designs 

Crucial 

But time consuming 
Not Systematic 

• Resulting deficiencies 

Inadequate throughput/memory 
Cost/time overruns 

Reduced reliability, testability, maintainability 
Project failure 


C/5 


r 



REQUIREMENTS/DESIGN METHODOLOGY 


DESIGN TREE 



821VC376 7 


DESIGN 

DOCUMENT 





USING DARTS 



DARTS 

USER 

INTERFACE 


DARTS 

DESIGN 

DATA 

BASE 


DESIGN AIDS 
FOR REAL-TIME 
SYSTEMS (DARTS) 


DATA 

EXTRACTION 

• DISPLAYS 

• TABULATIONS 


DESIGN 

ANALYSES 

• DATAFLOW 

• SIMULATION 

• QUALITY METRICS 




REPORTS 

DISPLAYS 


'£3 0*0 


DESIGN FEEDBACK 







P. Szulewski 
Draper Lab 
10 of 20 


DARTS MENUS 


• Primary features 

— Darts invocation 
— Simulation 
- Utilities 

• Secondary features 

— Systems management 
— Tree management 
— Graphics 
— Tables 
— Analysis 


8211C376-6 


P. Szulewski 
Draper Lab 
1 1 of 20 


PROCESS ARCHITECTURE TREE 



uKiGINA 1 












DARTS AUTOMATED SOFTWARE QUALITY MEASUREMENT 


• Objective measure of software quality 


• Uses Halstead's software science method 

• Accommodates varying levels of design detail 


• Automatic measurements from DARTS data base 


~ ore 

2,1 g 

erg 


821 1C376-2 


P. Szulewski 
Draper Lab 
14 of 20 


DARTS THREE COUNTING METHODS 


• Simple 

— All nodes are counted the same, and all indata and outdata 
lists are counted 

• Uninterpreted 

— Nodes are differentiated as being either functional nodes or 
decision nodes. Data lists are read accordingly: indata and 
outdata for functional nodes, and predvar lists for decision 
nodes. Each node is counted separately by node ID 

• Interpreted 

— Nodes are counted by name and all tabs are parsed for oper- 
ators and operands. Data lists are ignored 


821 1 C376-3 





P. Szulewski 
Draper Lab 
16 of 20 


DflRIS EXfffflE 

sofmwe oaroeff itvEL 




ORIGINAL PAGE IS 
OF POOR QUALITY 


P. Szulewski 
Draper lab 
17 of 20 


mx - irsiGH /jot 

YTM EEAl^TIME jnSTfclC 

PLOTTUtZ 

DAT/ Ti.JZB I s Tl— rr 



3 

D.VP> 24 WV 13*2 
TiveE. mrsi 
'iM* 7772.4 
'i GE -EPATlOfTS 



ORIGINAL PAGE 
OF POOR QUALI 



P. Szulewski 
Draper Lab 
18 of 20 


DARTS METRIC ANALYSIS 


CSDL ** DESIGN-AIDS 

TOPNCDE 10:7.2.2.2.4 

PAGE 2 

FCR REAL-TIME SYSTEMS 

3 GENERATIONS 

DATE: 24 NOV 19S2 

HALSTEAD METRIC 

DATABASE IS: TEST 

USER IS: PAS3132 

TIME: 13:12:14 


COUNTING METHCO : SIMPLE UNINTERPRETED 


DISTINCT OPERATORS 

3 

14 

24 

OISTINCT OPERANDS 

20 

20 

21 

TOTAL OPERATORS 

35 

31 

56 

TOTAL OPERANDS 

55 

42 

49 

VOCABULARY 

23 

34 

45 

DESIGN LENGTH 

90 

73 

105 

ESTIMATED LENGTH 

91.2 

139.7 

202.3 

PERCENT OFF 

-1.33 

-91.43 

-92.65 

DESIGN VOLUME 

407.121 

371.335 

576.645 

POTENTIAL VOLUME 

28.529 

28.529 

23.529 

DESIGN LEVEL 

0.070 

0.077 

0.049 

ESTIMATED DESIGN LEVEL 

0 . 242 

0.C68 

0.036 

INTELLIGENCE CONTENT 

98.696 

25.264 

20.594 

LANGUAGE LEVEL 

1.999 

2.192 

1.411 

EFFORT 

5309.715 

4834.555 

11655.336 



ORIGINAL PAGE IS 
OF POOR QUALITY 
















DARTS SUMMARY 


• User friendly 

• Hierarchical structure 

• Can accommodate reai-time software 

• Static quality analysis 

• Dynamic analysis 

• Documentation support 

• Design traceability 


P. Szulewski 
Draper Lab 
20 of 20 


DARTS FUTURE 


• Near term 


— Validate existing metrics and add others to DARTS 
design quality analysis feature. (This effort is pres- 
ently under contract to Rome Air Development Center 
#F30602-82-C-01 30) 

• Long term 

— Consider DARTS as a part of an integrated software 
engineering support environment 


8211C376 5 


\ 




t) o 


3^4 


PANEL #3 

SOFTWARE ERRORS 


T. Ostrand/E. Weyuker, Sperry Univac/Courant Institute 
E. Solloway/W. Johnson/S. Diaper, Yale/University of California 
D. Buckland, Reifer Consultants 



SOFTWARE ERROR DATA COLLECTION AND CATtGORXZ A* ION 


Thomas J. Ostrand Elalna J. Wayukor 

Software Technology Research Courant Institute 

Sperry Univao New York University 

Blue Bell , PA 1942M New York, NY 10012 

Seventh Annual Software Engineering Workshop 
Goddard Spaoe Flight Center 
December 1, 1982 

A study has been made of the software errors deteoted during 

development of an interactive speolal-purpose editor system. This product 

has been followed during nine months of coding, unit testing, function 

testing, and system testing. Detected errors and their fixes have been 

described by testers and debuggers. To help analyze the relationship of 

error characteristics to the various aspects of the software development 

process, a new error categorization scheme has been developed. Within this 

scheme, 17*1 errors were classified. For eaoh error, we asked the 

programmers to select the most likely cause of the error, report the stage 

of the software development cycle in which the error was created and first 

noticed, and the circumstances of its detection and isolation, including 

time required, techniques tried, and successful techniques. 

The programmers were also asked to give a written description of the 

error, its symptoms, and its correction. The new error categorization 

scheme was developed from these descriptions. Four generic attributes or 

dimensions of software errors were identified; an error is classified by 

assigning it a value for each dimension. The four dimensions and their 

possible values reflect the specific errors studied for this project. As 

the study is extended to development projects producing different types of 

software and different types of errors, the dimensions and their values will 

be extsnded as needed. T. Ostrand 

Univac 
1 of 33 



The four present error dimensions ere: 


t Major Category - a broad description of the error , Identifying 

the type of code that was changed to make the correction. 
The seven major categories into which errors from the 
Interactive editor have been put are: 

[ Code that defines constants , storage areas, control 
codes, transfer tables, etc. 

Code that modifies or initializes the values of 
variables. 

Code that evaluates a condition and branches according 
to the result. 

Code that evaluates a condition and performs a specific 
computation if the condition is satisfied. 

Written description of the produot. 

An error external to the program itself. Including 
operating system, compiler, hardware, etc. 

Problem reports that are resolved without changing any 
part of the system or product. 

more specific information modifying 
the major category. 

whether the error involves omitted, 
superfluous, or incorreot code. 

whether the error involves an initialization, 
update, or setting of data. 

The interactive editor system is a small project; three programmers 
spent about two person-years in its development and testing. The source 
code consists of about 9000 lines of high-level language, and 1000 assembler 
instructions. Obviously, this small size and the limited number of 
programmers prevent us from drawing any far-reaching conclusions from the 
error data. We view this study as a pilot effort whose primary results have 
been the experience gained in collecting software error data, creation of 
the error categorization scheme, and the formation of a number of hypotheses 
about software development and validation methods. 

The experience will be applied to future er r studies, which are 
planned on other software projects. The categorization soheme will be used 

to classify the errors reported from these projects, and will be extended 

T. Ostrand 
Univac 
2 of 33 


I&fcg Paflftlfclflj 

Halt. Handling 

Peclalgn 

Bgfilflloa nlug 
Er.g.c.«jBlng 
Bfl&uMnUtlfln 
gyatea 

Hal IE Error 


• Type - 


• Presence - 


• Use - 



with additional attributaa and major cattgoriaa. Tha hypothaaaa will ba 
axaninad in light of tha arror information oollaotad from thaae additional 
projaot^. 

Evan within tha amall aoopa of tha data oollaotad from tha aditor 
project, soma interesting relationships wars observed between an error's 
major category on tha one hand, and tha error's prasanoa and tha type of 
tasting which dataotad it on tha other. Among deoision-related errors 
(major oatagory daoiaion or dgoialon £lUA processing) ,81$ ware omitted code 
and 19$ were incorrect coda. For data definition errors, 31$ were omissions 
and 69$ incorrect. Data handling errors were split approximately evenly 
between omitted and incorrect oode, as was the entire set of errors reported 
on. Previous error studies have reported a similar majority of omitted code 
errors involving decisions. In five software projects monitored at TRW 
[133, deoision-related errors of omission ranged from 65$ to 96$ of all 
decision-related errors. In turn, the decision errors were 11$ to 36$ of 
all errors. Glass [7] counted 60 "omitted logic" errors out of a total of 
200 . 

At the present time the interactive editor has just been released to 
customers; all errors reported to date have been detected during internal 
testing activities. A very large majority of these pre-release errors were 
isolated and corrected quickly. Less than 1 hour per error was expended to 
isolate 79$ of the errors and to correct 71$. Within 4 hours, 88$ were 
isolated and 90$ were oorreoted. These figures are similar to the effort 
measured by Weiss [16] and Presson [11]. 

Since our error collection spanned the entire development process, we 
were able to observe substantial differences between the effectiveness of 
unit and function testing for detecting some categories of errors. Unit 

testing is performed by the software project's original coders, testing 

T. Ostrand 
Univac 
3 of 33 



their own nodules or proosdurss. The goal Is to find errors effecting the 
functional behavior of these Individual units. Funotlon testing Is 
performed on the ooaplete product by a separate testing group. A test plan 
la developed from the user manual, and the test oases sttenpt to exeoute all 
potential user aotlvities with the produot. Unit testing detected twice as 
many (22 va. 11) data handling errora aa funotlon testing did. Function 
testing was sore successful on data definition errors (47 to 7), decision 
errors (20 to 10), and decision plus processing errora (25 to 1). 

These figures may reflect an Inherent weakness in the ability of unit 
testing to detect certain categories of errors. Another possibility, 
however, la that unit testing Is most successful when errors occur primarily 
through programmer fallings, and least successful when errors are due to 
"high-level" problems such as ambiguous or Incomplete specifications. This 
interpretation is supported by the programmers' choices of reasons for 
errors occuring. The three most oommonly oited error causes were programmer 
JUX2L (68%), pp.Q£ flPtQlflciUgni (139), and glftclttil (9%). Of the 21 errors 
due to poor speclficatona, only one was detected in unit testing, and 
seventeen were detected in function testing. 

Errors caused by poor specifications were not only detected later than 

the average of all errors; they also required more effort to correot. Only 

24 % of specification-caused errors were fixed in under 1 hour, 52% in 1 to 4 

hours, and 24J in 4 hours to 1 day and over 1 day. The relatively high 

correcMon effort for these errors illustrates the common belief that the 

cost of correcting an error increases when the error remains in the system 

during multiple phases of the development cycle. Page [10], for example, 

states that the correction cost approximately doubles as an error enters 

each successive phase. These specification-caused errors entered the system 

during program design, and remained undetected during coding and unit 

T. Ostrand 
Univac 
4 of 33 



testing. In addition, the error fixing effort reported here is only the 
tine spent by the programmers in constructing fixes, and does not include 
the effort expended by an independent tester in detecting the error and 
supplying additional diagnostic information. If these were included, the 
total correction cost would be even higher. 


References 


Cl] Amory, W. and J.A. Clapp, M A Software Error Classification Methodology", 
MTR-2648, Vol . VII, Mitre Corp., Bedford, MA, 30 June 1973. 

[2] Baker, W.F., "Software Data Collection and Analysis: A Real-Time System 
Project History", RADC-Tft-77-192, Rome Air Development Center, Griffis AFB, 
NY, June 1977. 

[3] Basili, V.R., "Data Collection Validation and Analysis", Draft Software 
Metrics Panel Final Report, ed. A.J. Perils, F.G. Sayward, and M. Shaw, 
Washington, DC, 30 June 1980. 

[4] Basili, V.R. and D.M. Weiss, "Analyzing Error Data in the Software 
Engineering Laboratory", Fourth Minnowbro^k Workshop on Software Performance 
Evaluation, Blue Mtn. Lake, NY, August 1981. 

[5] Basili, V.R., M.V. Zelkowitz, F.E. McGarry, R.W. Reiter, W.F. 
Truszkowski, and D.M. Weiss, "The Softw^r-t Engineering Laboratory", Tech. 
Report TR-535, U. Maryland Computer Science Center, College Park, MD, May 
1977. 

[6] Endres, A., "An Analysis of Errors and Their Causes in System Programs", 
IEEE Trans. Softw. Eng .. Vcl SE-1, June 1975, 140-149. 

[7] Glass, R.L., "Persistent br Ttware Errors", IEEE Trans. Softw. Eng . f 
Vol. SE-7, March 1981, 162-168. 

[8] Litecky, C.R. and G.B. Davis, "A Study of Errors, Error-Proneness, and 
Error Diagnosis in Cobol", Comm. ACM . Vol. 19, January 1976, 33-37. 

[9] Mandis, K.S. and M.L. Goins, "Categorizing and Predicting Errors in 
Software Programs" , Prop . 2sA AIM £QJUmt.ftCfl 111 Aftrgg.Bftfi.ft Conf . , Los 
Angeles, October 1979, 300-308. 

[10] Page, J., "Evaluating the Effects of an Independent Verification and 
Validation Team", Proc. 6th Ann. Software Ena. Workshop . Goddard Space 
Flight Center, Greenbelt, MD, December 1 98 1 . 

T. Oslraiul 
Univac 
5 of 33 


[11] Presson, PE., M A Study of Software Errors on Large Aerospace 
Projects" , h'/sSLu Nat. Cojlf . QJO. Software Technology mnl Management. 
Alexandria, VA, October 1981 

[12] Schneidewind, N. and H. Hoffman, "An Experiment in Software Error Data 
Collection and Analysis", IEEE Trans. Softw. Eng .. Vol SE-5, May 1979 » 
276-286. 

[13] Thayer, T.A., M. Lipow, and E.C. Nelson, Software Reliability . TRW 
Series of Software Technology, Vol. 2, North-Holland, Amsterdam, 1978. 

[14] Thibodeau, R., "The State-of-the-Art in Software Error Data Collection 
and Analysis - Final Report", General Research Corp., Huntsville, AL, 

Jan. 31, 1978. 

[15] Weiss, D.M., "Evaluating Software Development by Error Analysis: The 
Data from the Architecture Research Facility", J. Systems and Software . 

Vol. 1, 1979, 57-70. 

[16] Weiss, D.M., "Evaluating Software Development by Analysis of Change 
Data", Tech. Report TR-1120, U. Maryland Computer Soience Center, Collie 
Park, MD, ‘JoM'iV' 1^51 


T. Ostrand 
Univac 
6 of 33 



THE VIEWGRAPH MATERIALS 
for the 

T. OSTRAND/E. WEYUKER PRESENTATION FOLLOW 


T. Ostra-id 
Univac 
7 of 33 


SOFTWARE ERROR DATA COLLECTION 
AND CATEGORIZATION 


THOMAS OSTRAND ELAINE WEYUKER 

SOFTWARE TECHNOLOGY COURANT INSTITUTE 

SPERRY UNIVAC NEW YORK UNIVERSITY 


SEVENTH SOFTWARE ENGINEERING WORKSHOP 
GODDARD SPACE FLIGHT CENTER 

DECEMBER 1, 1982 


T. Ostrand 
Univac 
8 of 33 



PROJECT DESCRIPTION 


PURPOSE: IMPLEMENT A LANGUAGE “SPECIFIC INTERACTIVE 

EDITOR, 

FEATURES: - templates for data definitions and 

CONTROL STRUCTURES 

- FORMATTING OF SOURCE CODE, 

- DYNAMIC SYNTAX CHECKING 

- PROMPTING FOR REQUIRED PROGRAM SECTIONS. 


SCHEDULE : - specification available 

“ CODING BEGAN 

- FUNCTION TESTING BEGAN 

- SYSTEM TESTING BEGAN 

- CUSTOMER TESTING BEGAN 

- RELEASE 


11/80 

mi 

11/81 

m2 

6/82 

11/82 


T. Ostrand 
Univac 
9 of 33 



PROJECT DESCRIPTION 


STAFF: - 1 full time, 2 part-time programmers 


SIZE: - source code 9000 lines hll 

1000 LINES AL 

- OBJECT CODE 70,000 BYTES 


T. Ostrand 
Univac 
10 of 33 


CHANGE INFORMATION COLLECTED FROM PROGRAMMERS 


CHECK-OFF INFORMATION 

- PROBLEM DETECTION METHODS 

- PROBLEM ISOLATION METHODS 

- ORIGINAL CODER 

- TIME REQUIRED FOR ERROR ISOLATION AND ERROR FIXING 

- SIZE OF CHANGE 

- WHEN PROBLEM WAS NOTICED 

- WHEN PROBLEM WAS CREATED 

- WHY DID THE PROBLEM OCCUR 


T. Ostrand 
Univac 
11 of 33 


CHANGE INFORMATION COLLECTED FROM PROGRAMMERS 


WRITTEN INFORMATION 

- DATES 

- NAMES OF CHANGED UNITS 

- DESCRIPTIONS OF 

• PROBLEM SYMPTOMS 

• ACTUAL PROBLEM 

• FIX 

- OTHER MISCELLANEOUS INFORMATION 


T. Ostrand 
Univac 
12 of 33 



ERROR CATEGORIZATION METHODS 


AMORY & CLAPP 
ENDRES 

THAYER ET AL 
GLASS 


NITRE 

IBM DOS SOFTWARE 
TRW APPLICATIONS 
BOEING APPLICATIONS 


CHARACTERISTICS OF THESE METHODS ARE: 

- TREE SCHEME FOR CATEGORIZATION 

- AMBIGUOUS, OVERLAPPING, INCOMPLETE CATEGORIES 

- TOO MANY CATEGORIES 

- FAILURE TO DISTINGUISH BETWEEN: 

• SYMPTOMS OF AN ERROR 

• DESCRIPTIVE CHARACTERISTICS OF AN ERROR 

• CAUSE OF AN ERROR'S EXISTENCE 


T. Ostrand 
Univac 
13 of 33 


ATTRIBUTES IN OUR CURRENT SCHEME 


• MAJOR CATEGORY 


# TYPE 


• PRESENCE 


• USE 


T. Ostrand 
Univac 
14 of 33 




ATTRIBUTES 


MAJOR CATEGORY 

DATA DEFINITION 

- DEFINE CONSTANTS, STORAGE - AREAS, CONTROL 
CODES, ETC, 

DATA HANDLING 

- SET, INITIALIZE, OR MODIFY VALUES OF 
VARIABLES, 

DECISION 

- EVALUATE A CONDITION AND BRANCH ACCORDING 
TO THE RESULT, 

DECISION & 
PROCESS 

- EVALUATE A CONDITION AND PERFORM A 
COMPUTATION, 

DOCUMENTATION 

- DESCRIPTION OF PRODUCT OR CODE 

CLERICAL 

- TYPING, HANDWRITING 

SYSTEM 

- PROBLEM IN THE ENVIRONMENT EXTERNAL TO 
THE PROGRAM AND ITS DOCUMENTATION, 

NOT AN ERROR 

" PROBLEM RESOLVED WITHOUT CHANGING THE 
PRODUCT OR SYSTEM 


T. Ostrand 
Univac 
15 of 33 


ATTRIBUTES 


TYPE: MODIFIES THE MAJOR CATEGORY 


t FOR ERRORS INVOLVING DATA: 

ADDRESS “ IDENTIFIES LOCATION IN MEMORY, 

EXAMPLES : ARRAY INDEX# LIST POINTER# 
TABLE NAME# OFFSET INTO A 
DEFINED STORAGE AREA, 


CONTROL - DETERMINES APPROPRIATE FLOW OF CONTROL 


DATA - PRIMARY INFORMATION WHICH IS READ# 

WRITTEN# OR PROCESSED. 


• FOR ERRORS INVOLVING DECISIONS: 
LOOP 

MULTIPLE-WAY BRANCH 


T. Ostrand 
Univac 
16 of 33 


ATTRIBUTES 


PRESENCE : 

CODE WAS 

OMITTED 

LEFT OUT 

SUPERFLUOUS - 

PRESENT/ BUT NOT NEEDED 

INCORRECT - 

PRESENT/ AND HAD TO BE 
CHANGED. 


USE ? THE TYPE OF OPERATION PERFORMED ON DATA 

SET 

INITIALIZE 

UPDATE 


T. Ostrand 
Univac 


Major Categories of Non-Clerical Errors 


DATA DEFINITION 
DATA HANDLING 
DECISION 

DECISION A PROCESSING 
OTHER A UNKNOWN 

■0 20 30 40 50 AO 

TOTAL NUMBER 


ALL ERRORS 


54 


i 3 * 




31 



J 32 


7 




T. Osl 
Univac 
19 of 33 


Major Categories of Non-Cierical Errors 


ALL ERRORS 


DATA DEFINITION 
DATA HANDLING 
DECISION 

DECISION & PROCESSING 
OTHER & UNKNOWN 



10 20 30 4 

TOTA a NUMBER 

M— DETECTED IN UNIT TESTING 


DETECTED IN FUNCTION TESTING 


Major Categories of Non-Clerical Errors 

ALL ERRORS 

DATA DEFINITION 
DATA HANDLING 
DECISION 

DECISION AND PROCESSING 
OTHER ft UNKNOWN 

10 2C 30 40 50 


h- 

r 

i 


PERCENT 



T. Ostrand 
Univac 
21 of 33 


Major Categories of Non-Clerical Errors 


ALL ERRORS 


DATA DEFINITION 
DATA HANDLING 
DECISION 

DECISION AND PROCESSING 
OTHER ft UNKNOWN 



PERCENT 


Mi— DETECTED IN UNIT TESTING 


or ’ pa ;;?.' rv'Su 13 

OF FOUR QUALITY 




T. Osl 

Univa< 
22 of 


Major Categories of Non-Clerical Errors 


ALL ERRORS 




DETECTED IN FUNCTION TESTING 




T. Ostrand 
Univac 
23 of 33 


Error Presence Attribute for 
Each Major Category 

DATA DATA DECISION PLUS 

DEFINITION HANDLING DECISION PROCESSING 

NUMBER OF ERRORS 

IN THIS CATEGORY 54 38 31 32 


PERCENT OF ERRORS 
WHICH WERE 
INCORRECT CODE 


69% 



PERCENT 
OMITTED CODE 


37% 



PERCENT 

SUPERFLUOUS CODE 


5% 


OF POOR QUALITY 



Error Isolation Effort 


ALL ERRORS 


UNKNOWN 

F 


MORE THAN 1 DAY 


8 

4 HOURS T0 1 DAY 

]■ 


1 TO 4 HOURS 


1* 

LESS THAN 1 HOUR 


79 


10 20 30 *0 50 60 70 80 90 100 


PERCENT OF ERRORS ISOLATED IN GIVEN TIME 




T. Ostrand 
Univac 
25 of 33 


Error Isolation Effort 


UNKNOWN 
MORE THAN 1 DAY 
4 HOURS T0 1 DAY 
1 TO 4 HOURS 
LESS THAN 1 HOUR 


ALL ERRORS 



10 20 3G 4n 53 60 70 80 90 100 

PERCENT OF ERRORS ISOLATED IN GIVEN TIME 



ERRORS DETECTED IN UNIT TESTING 


ORIGINAL PAGE ® 
OF POOR QUALITY 




T. Osi 
Univa< 
26 of 


Error Isolation Effort 


UNKNOWN 
MORE THAN 1 DAY 
4 HOURS TO 1 DAY 
1 TO 4 HOURS 
LESS THAN 1 HOUR 


ALL ERRORS 


m 

1 


m f trmiff twii fmwtmwi mi w m 


rnnj 


i 




10 20 30 40 50 SV 7C SO 

PERCENT OF ERRORS ISOLATED IN GIVEN TIME 

ERRORS DETECTED IN FUNCTION TESTING 




T. Ostrand 
Univac 
27 of 33 


Error Isolation Effort 


UNKNOWN 
MORE THAN 1 DAY 
4 HOURS TO 1 DAY 
1 TO 4 HOURS 
LESS THAN 1 HOUR 


ALL ERRORS 



PERCENT OF ERRORS ISOLATED In GIVEN TIME 
EHO DUE TO POOR SPECIFICATIONS 


OF POOR QUALITY 




T. Osl 

Univa< 
28 of 


Error Fixing Effort 


UNKNOWN 
MORE THAN 1 D^Y 
4 HOURS T0 1 DAY 
1 TO 4 HOURS 
LESS THAN 1 HOUR 



10 20 30 40 59 60 70M 


PERCENT OF ERRORS FIXED IN GIVEN TIME 



T. Osti 
Univac 


Error Fixing Effort 


UNKNOWN 
MORE THAN 1 DAY 
4 HOURS TO 1 DAY 
1 TO 4 HOURS 
LESS THAN 1 HOUR 


ALL ERRORS 



PERCENT OF ERRORS FIXED IN GIVEN TIME 



ERRORS DETECTED IN UNIT TESTING 




T. Osl 
Univai 
30 of 


Error Fixing Effort 


UNKNOWN 
MORE THAN 1 DAY 
4 HOURS T0 1 DAY 
1 TO 4 HOURS 
LESS THAN 1 HOUR 


ALL ERRORS 



10 20304050607081 


PERCENT OF ERRORS FIXED IN GIVEN TIME 



ERRORS DETECTED IN FUNCTION TESTING 





T. Osl 
Univa< 
31 of 


Error Fixing Effort 


UNKNOWN 
MORE THAN 1 DAY 
4 HOURS TO 1 DA V 
1 TO 4 HOURS 
LESS THAN 1 HOUR 


ALL ERRORS 



10 20 30 40 50 63 70 81 


PERCENT OF ERRORS FIXED IN GIVEN TIME 



ERRORS DUE TO POOR SPECIFICATIONS 




SUMMARY OF RESULTS 


UNIT TESTING DETECTS DATA HANDLING ;;ORS WELL. 


FUNCTION TESTING DETECTS DECISION-RELATED ERRORS 
AND DATA DEFINITION ERRORS WELL. 


LARGE MAJORITY OF DECISION-RELATED ERRORS ARE 
OMISSIONS. (AGREES WITH PRIOR STUDIES). 


MOST ERRORS DETECTED BEFORE RELEASE ARE ? ELATED 
AND CORRECTED WITH LITTLE EFFORT, (AGREES WITH 
WEISS AND PRESSON) . 


SPECIFICATION-CAUSED ERRORS ARE MORE DIFFICULT TO 
CORRECT THAN OTHERS. 


T. Ostrand 
Uni vac 
32 of 33 



CONCLUSIONS OR HYPOTHESES 


MULTI -DIMENSIONAL ERROR CATEGORIZATION SCHEME IS 
EASIER TO USE AND MORE USEFUL FOR APPLICATIONS THAN 
TRADITIONAL TREE SCHEMES. 


CODE COVERAGE IS UNSATISFACTORY AS A BASIS FOR TEST 
CASE GENERATION AND AS A MEANS OF ASSESSING TEST 
ADEQUACY/ BECAUSE OF THE LARGE NUMBER OF ERRORS IN- 
VOLVING OMITTED CODE. 


UNIT TESTING IS AN INHERENTLY WEAK METHOD FOR DETEC- 
TION OF ERRORS CAUSED BY POOR SPECIFICATIONS. 


EFFORT SPENT IN PRODUCING HIGH-QUALITY SPECIFICATIONS 
WILL SUBSTANTIALLY REDUCE THE COST OF CORRECTING 
SOFTWARE. 


T. Ostraiul 
Univac 
33 of 33 



. N 8 3 32365 

Classifying Bugs is a Tricky Business' 

W. ’ '•is Johnson * 

St*i**. Draper ** 

Elliot t> jloway * 

* Department of Computer Science 
Yale University 
P.O. Box 2158 

New Haven, Connecticut 06520 

** Institute for Cognitive Science 
University of California, San Diego Mail Code C015 
La Jolla, California 


1. Context: Motivation and Goals 1 

About 2 years ago we decided to build a computer-based programming tutor to help students 
learn to program in Pascal; we wanted the system to identify the non-eyntactie bugs in a 
student’s program and tutor the student with respect to the misconceptions that might have 
given rise to the bugs. The emphasis was on the system understanding what the student did and 
did not understand; we felt that simply telling the student that there was a bug in line 14 was 
not sufficient — since oftentimes the bug in line 14 was really caused by a whole series of 
conceptual errors that could not be localised to a specific line in the program. However, in order 
to design the system we needed to know what bugs students did make in their programs and 
what misconceptions they typically labored under. On the basis of bug types found in a number 
of pencil-and-paper studies with student programmers (novices, intermediates, and advanced) 
[9, 10], we built and classroom tested a first version of such a programming tutor [11]. In the 
process of testing that system we instrumented the operating system on a CYBER 175 to 
automatically collect a copy of each syntactically correct program the student programmers 
attempted to execute while sitting at the terminal; we call this form of data “on-line protocols”. 
We collected such protocols on 204 students for an entire semester (7 programming assignments). 
We have systematically analyzed only a small portion of these data: the basis for this paper is 
the hand analysis of the first syntactically correct program that students generated for their first 
looping assignment, 2 i.e., 204 programs. 


*Thi9 work was co-sponsored by the Personnel and Training Research Groups, Psychological Sciences 
Division, Office of Naval Research and the Army Research Institute for the Behavioral and Social Sciences, 
Contract No. N00014-82-K-0714, Contract Authority Identification Number, Nr 154-492. Approved for 
public release; distribution unlimited. Reproduction in whole or part is permitted for any purpose of the 
United States Government. 

E. Soloway 
Yale 
1 of 18 


2 This problem is given in Figure 8, which will be discussed in section 4. 



The story we tell in this paper deals with our experiences in analysing these 204 on-line 
protocols. In particular, we will describe the observations we made in trying to 'build a bug 
classification scheme; the actual details of what bugs we found, their frequency, etc. can be found 
in [5]. The key observation is the following: while one might think that building a classification 
scheme for the bugs would be straightforward, it turns out not to be so simple; in fact, we will 
argue that: 

Buga cannot be uniquely described on the be •»> of features of the buggy program alone ; one 
must also take the programmer's intentions a. « knowledge state into account. 


2. A Simplified Example 

Consider the problem statement in Figure 1, which is a simplified version of the first looping 
problem that the students in our study had to solve in Pascal. From a novice’s perspective the 
difficult part of this problem is making sure that the negative inputs are filtered out before they 
are processed. There are two common approaches to solving this type of problem in an Algol-like 
language such as Pascal. In Figure 2 we depict a solution in which a negative input causes 
execution of one branch of a conditional, while a nen-negative input causes execution of the 
major computation of the loop. We call this type of structure a Skip-guard Han: 3 a 
conditional statement is used to guard the main computation from illegal values. Notice that one 
pass through the loop will be made for each input value. The second approach is given in Figure 
3; here an embedded loop filters out the illegal values. Notice that one pass through the outside 
loop will be made for each — and only each — legal value. We call the nested loop structure an 
Embedded Filter Loop Plan. 


Write a program that reads in integers, that represent the daily rainfall in the New Haven area, 
and computes the average daily rainfall for the input values. If the input is a negative number, do 
not count this value in the average, and prompt the user to input another, legal value. Stop 
reading when 99999 is input; this is a sentinel value and should not be used in the average 
calculation. 


Figure 1: Simplified Looping Problem 

Now consider the buggy program in Figure 4. The problem with this program is that if the 
user first types a negative input, and then types the sentinel value 99999, this value will 
— incorrectly — be processed as a legitimate value. A number of questions come to mind: 

1. How should we classify this bug? 

2. What piece of code is to blame? 

3. What mental error on the student’s part might have caused this bug? 


3 See [8, 3, Ojfor a more complete discussion of programming plans. 


E. Soloway 
Yale 
2 of 18 




ORJGfNAlL 

0F poor 

READ(RAINFALL) 

WHILE RAINFALL <> 99009 DO 
BEGIN 

IF RAINFALL < 0 
THEN 

WRITELNCBAD INPUT, TRY AGAIN’) 
ELSE 
BEGIN 

TOTAL := TOTAL ♦ RAINFALL; 

DAYS := DAYS ♦ 1; 


PAGE & 
QUAL/TY 


END; 

READ(RAINFALL); 


END; 


Figure 2: Using a Skip-Guard Plan 


READ(RAINFALL) 

WHILE RAINFALL <> 99999 DO 
BEGIN 

WHILE RAINFALL < 0 DO 
BEGIN 

WRITELNCBAD INPUT, TRY AGAIN’); 
READ (RAINFALL) 

END; 

IF RAINFALL <> 99999 THEN 
BEGIN 

TOTAL := TOTAL ♦ RAINFALL; 

DAYS := DAYS ♦ 1; 
READ(RAINFALL) 

END; 

END; 


Figure 3: Using an Embedded Filter Loop Plan 

4. What piece of code should we change to make the program correct? 

In order to answer these questions, however, we need to answer another one first: 

Wbat programming approach was the user trying to implement? That is, did the student intend 
to implement the tkip-guard plan or did he try to implement the embedded filter loop 
plan ? 

Answers to the first 4 questions will be different depending on how we answer this last question. 

We will continue this example by presenting first an argument that supports the choice of the 

skip-guard plan, and then an argument that supports the choice of the embedded filter 

E. Soloway 
Yale 


ORIGINAL PAGE 18 
OF POOR QUALITY 


READ(RAINFALL) 

WHILE RAINFALL <> 99990 DO 
BEGIN 

WHILE RAINFALL < 0 DO 
BEGIN 

WRITELN06AD INPUT, TRY AGAIN'); 
READ (RAINFALL) 

END; 

TOTAL :* TOTAL ♦ RAINFALL; 

DAYS := DAYS ♦ 1; 

READ(RAINFALL) 

END; 


Figure 4s Sample Buggy Program 

loop plan ; we will then describe a basis for making a choice between the two competing 
positions. Consider, then, Figure 5 in which we depict the buggy program again, plus a 
generalized, template version of the skip-guard plan. We can describe the buggy program in 
terms of a difference description between it and the generalized plan. As shown in Figure 5 f 
there are 3 differences: 

1. need an IF instead of a WHILE inside the loop, 

2. have an extra read inside the loop, 

3. will always execute the processing steps since there is no way to skip around the 
processing. 

The first difference is a plausible bug for a novice to make; in our examination of novice 
programs we have seen novices confuse IF and WHILE: students sometimes construct a loop with 
simply an IF, and sometimes they use just the test part of the WHILE statement 4 [2, 0], 
Similarly, the second difference is also plausible for novices; again, we have found that novices 
often add bits of spurious code, oftentimes attempting to mimic the redundancy they oiven use in 
formulatiug plans and actions in the real world. Finally, if we assume that the programmer 
really meant to simply test RAINFALL, then all that is missing is an ELSE to cause the skip 
around the computation; novices notoriously have trouble with the ELSE parts of conditionals. 
Thus, the buggy code in Figure 5 is not that different from the skip-guard plan ; when 
considering differences from only this plan it is entirely conceivable that the novice 
programmer was trying to implement this plan in his code. 


''While this may seem strange to us as expert programmers, if we take a moment to reflect, we can see that using 
WHILE for a conditional and a loop, and IF for only the conditional part is somewhat arbitrary, given their meanings 
in English. 

E. Soloway 
Yale 
4 of 18 



ORIGINAL PAG*..' rj 
OF POOR QU/mJTY 


READ(RAINFALL) 

WHILE RAINFALL <> 99899 DO 
BEGIN 

WHILE RAINFALL < 0 DO 
BEGIN 

WRITELNCBAD INPUT. TRY AGAIN*); 
READ(RAINFALL) 

END; 

TOTAL := TOTAL ♦ RAINFALL; 

DAYS := DAYS ♦ 1; 

READ(RAINFALL) 

END; 


BUG DESCRIPTION: 

1. need an IF instead of a WHILE 

2. have an extra REAO in inner loop 

3. Kissing ELSE; orocessing of input 
will never be skipped 


Skip-Guard Han 

IF x < Kin 
THEN 
BEGIN 

print error sessage 
END 
ELSE 
BEGIN 

process Input 
END 


Figure 5: Bug Description Assuming Skip-Guard Flan 

Now consider Figure 0 in which we again depict the buggy program. This time, however, we 
show differences between it and a generalized, template version of an omboddod filtor loop 
plan Notice that the code matches the plan well; the only bug is a missing guard before the 
code that processes the input: the running total update and the counter update must be 

protected from including a sentinel value in the computation. 

The analysis in Figures 5 and 6 would lead to different answers to the first 4 questions above. 
For example, if we believe that the analysis in Figure 5 is correct, we might say the following to 
t he student; s 

It seems that you are having some trouble with conditional statements. For example, did you 
realize that there exists a statement called IF that allows you to test .... 

To correct your program, you might want to add an ELSE clause... 

Moreover, we would classify the bugs as an (1) incorrect statement type, (2) spurious read, (3) 

missing ELSE. On the. other hand, if we believe that the analysis in Figure 0 is correct, then we 


8 We do not want to argue about the best pedagogical strategy for interacting with the student; that in itself is a 
very difficult question. The particular response shown is simply meant to illustrate one type of response to this 

situ,ltion - E. Soloway 

Yale 
5 of 18 



ORIGINAL PAGE? IS 
OF POOR QUALITY 


READ(RAINFALL) 

WHILE RAINFALL <> 99009 00 
BEGIN 

WHILE RAINFALL < 0 DO 
BEGIN 

WRITELNCBAO INPUT. TRY AGAIN’); 
READ(RAINFALL) 

END; 

TOTAL := TOTAL ♦ RAINFALL; 

DAYS := DAYS ♦ 1; 

READ(RAINFALL) 

END; 


Embedded Filter Loop Plan 

WHILE x < min DO 
BEGIN 

print error message 
READ x 
END 

sentinel guard plan 
procass input 


BUG DESCRIPTION; 

1. missing conditional (guard) on 
processing tha input 


Figure 0: Bug Description Assuming Embedded Filter Loop Plan 

might say something like the following to the student: 

You should notice if the sentinel value follows the input of a negative value that your program 
will compute an incorrect average 

The bug type then might be a missing guard (conditional) plan. 

By this time the reader’s intuition is surely saying that the correct analysis of the buggy 
program in Figure 4 is that the programmer intended to implement an embedded filter loop 
plan. The bug counts (3 for the ekip-guard plan and 1 for the embedded filter loop 
plan) provide quantitative support for this decision. However, we feel that the key in the 
decision process — and the basis for our intuition — is our understanding of the student’s 
program provided by the plan analysis in Figure S: thus, the bug categorization and bug count 
follow from our understanding of the program — and not the other way around. We purposely 
choose an example over which there would be little controversy. However, the point was (1) to 
show how much reasoning we often do about programs implicitly, and (2) to show how different 
bug categorization and bug counts could be as a function of choice of intended underlying plan. 

While the above decision was relatively clear, let us perturb the buggy code a bit further and 

see how murky these type of decisions can — and do — become. In Figure 7 we show three 

buggy program fragments; let us compare the bug categorization and bug counts using the two 

E. Soloway 
Yale 
6 of 18 



original page is 

OF POOR QUALITY 


alternative plans for each of the programs. 

• Figure 7a 

► Using the embedded filter loop plan we get the following bug differences: 

1. the WHILE and IF keywords have been interchanged 

2. there is a missing read for a new value 

3. there is a missing guard on the subsequent input processing 

» Using the skip -guard plan we get the following bug differences: 

I . missing ELSE on the internal IF 

• Figure 7b 

» Using the embedded filter loop plan we get the following bug differences: 

1. the WHILE and IF keywords have been interchanged 

2. there is a missing guard on the subsequent input processing 

► Using the skip-guard plan we get the following bug differences: 

1 . spurious READ 

2. missing ELSE on the internal IF 

• Figure 7c 

» Using the embedded filter loop plan we get the following bug differences: 

1. missing read for a new value 

2. there is a missing guard on the subsequent input processing 

► Using the skip-guard plan we get the following bug differences: 

1. the WHILE and IF keywords have been interchanged 

2. missing ELSE on the internal IF 

We would argue that the programmer of the code in Figure 7a intended to encode a 
skip-guard plan: again, the bug counts (3 for the embedded filter loop plan and 1 for the 
skip-guard plan) support the intuition that it is more plausible that the programmer simply 
left out an ELSE, as opposed to swapping keywords, etc. However, the code in Figures 7b and c 
are not so easily analyzed: the bug counts are the same and the plausibility of the bug types are 
reasonably similar. In order to make a reasoned decision we need to bring other evidence from 
the program to bear. For example, in Figure 7b the programmer used a WHILE loop to correctly 
implement the outer loop; this is some evidence that he understands how and when to use this 
construct. Thus, we might be confident that the programmer really meant IF in the program in 
Figure 7b. On the other hand, the inclusion of the spurious READ is unsettling. However, the 
program in Figure 7c is certainly the most problematic: the bug counts are the same, the 
plausibility of the bugs are similar, and the additional outside information is equivocal. The 
moral of this program is that it can be exceedingly difficult to make decisions about plans — and 
bugs — by simply looking at the code. 

The point of these latter examples is to illustrate how quickly the decision about what the 

E. Soloway 
Yale 
7 of 18 



ORIGINAL PAGE 19 

OF POOR QUALITY 


b 


RfAI'FFAINF All ) 

WHILE RAINFALL *» 99999 CO 
PC. IN 

IF RAINFALL < 0 THEN 

W&1 TUNE 'BAD INPUT TRY AGAIN') 
TOTAL - Tl'TAl. ♦ RAINFALL 
PAY', I AT j ♦ 1 
Rr AC (RAINFALL) 

(NO 


REAO(RAINMLl) 

WHILE RAINFALL <> Hill DO 
BEGIN 

IF RAINFALL < 0 THEN 
BEGIN 

WRITELNCBAD INPUT, TRY AGAIN’) . 
REAO(RAINFALL) . 

END. 

TOTAL « TOTAL ♦ RAINFALL. 

DAYS » DAYS ♦ I. 

READ(RAINFALL) 

END. 


RFAD(RAINFALL) 

WHILE RAINFALL <> Hill DO 
BEGIN 

WHILE RAINFALL < 0 DO 

WRITELNCBAD INPUT, TRY AGAIN 1 ), 
TOTAL * TOTAL ♦ RAINFALL, 

DAYS * DAYS ♦ 1, 

READ(RAINFALL) 

END 


Figure 7: Clouding the Waters: Additional Buggy Programs 

programmer intended gets murky, and how additional information outside the buggy area needs 
to be brought to bear. We see again that for the programs in Figure « the bug categorisation 
and bug frequencies change depending on what decision is made about the programmer's 

intention. 

Finally, the fact that the programs we have shown are novieea ’ programs is really irrelevant to 
the point in question: the problem is that the intention of the programmer effects the bug 
categorization and the bug count. Quite reasonably, we would not expect a professional 
programmer to mistake an IF for a WHILE. The observation that we would not expect this 
particular confusion would in fact aid us in inferring the intention — it would not, we believe, 
simply make the problem go away. In fact, we might well see buggy code such as Figure 4, 
Figure 7 from a professional programmer. 

3. Methods for Specifying the Intention of » Program 
In the above section, the basis for describing bugs was the difference between a program and 
the programming plans that specified a correct program. There are other methods of specifying 
the intention of a program: 

• I/O Behavior 

E. Solo way 
Yale 
8 of 18 



• Programming Plana 

• Corrected Version of the Buggy Program 

• Program Description Language (PDL) 

In what follows we will examine each of these in turn, and explore their good points and the bad 
points with respect to using a method as a basis for developing bug difference descriptions. 

I/O BEHAVIOR 

An I/O specification for the problem in Figure 1 would be quite close to the problem statement 
itself. The obvious problem with this method is its vagueness with respect to the code: many 
different code fragments can misbehave in the same manner (e.g., there are many, many ways to 
generating an infinite loop — but the I/O result is the same in all cases). One needs to be able 
to make finer-grain distinctions than are facilitated by a comparison of the code to simply I/O 
specifications. 

PROGRAMMING PL A. IS 

The major problem with this method is the need to guess what plan the programmer intended 
to implement. However, once the decision is made, then describing the bug as a difference 
between the plan and the code is relatively easy. One method of coping with the plan decision 
problem is interviews with the original programmers; this technique has been used to “validate” 
change report data in several software monitoring projects (e.g., (12)). Unfortunately, in a class 
of 200 students writing code at different terminals, interviews with subjects is a bit more 
difficult. 

The major benefit derived from building a bug description using this method is an accurate 
reporting of the cause of the bug. That is, clearly the goal of a bug taxonomy in which one 
captures bug type and bug frequency is the ability to pinpoint the sources of the bugs: one 
would like to know which bugs came from misunderstandings of the specifications document and 
which bugs arose from coding errors, etc. For example, in the previous section if we assumed 
that the programmer intended to implement a skip-guard plan then we would say that there 
were a number of coding level bugs (e.g., WHILE instead of IF, missing ELSE, spurious READ). 
However, if we assume that the programmer intended to implement an embedded filtar loop 
plan, then the source of the bug may be a problem of specification interpretation: the 
programmer may not have thought that someone would ever input the sentinel value after 
inputing an illegal (negative) value. Thus he felt no need to guard subsequent computation. (An 
intervk-w with the programmer would be particularly useful in this specific case.) Thus, bug 
categorization and bug origin is directly influenced by the choice of underlying plan structure in 
the buggy program. 

CORRECTED VERSION OP THE BUGGY PROGRAM 

E. Solo way 
Yale 
9 of 18 



The typical method of describing a bug is to compare the original buggy program with the 
corrected version of that program (e.g., [12, 7, 1]). While there is no guessing as to the intention 
of the original programmer, we see 2 basic problems with this approach: 

s The choice of the particular corrected program used ns the measure is relatively 
arbitrary. That is, there are few hard guidelines for making changes to code. Thus, 
different programmens could well take the same buggy program and correct it in 
different ways. This would result in two different bug descriptions — an intuitively 
unsatisfactory situation. Moreover, different bug descriptions could lead to different 
conclusions as to the origins of the bugs, which, afterall, is the the point of doing the 
bug categorisation in the first place. For example, if the buggy program in Figure 
4 were corrected by implementing a skip-guard plan, then the difference between 
the buggy program and the corrected program would result in a bug description 
containing 3 coding level bugs. On the other hand, if the program is corrected by 
putting in a guard around the subsequent computation to protect against a sentinel 
value, then the bug description would only contain 1 bug, a missing conditional 
(guard plan) ••• which may or may not be a coding level bug (as discussed above). 

While wc might prefer the programmer to make the tatter change, there is no way to 
guarantee this situation. 

Interviewing the original programmer might shed some light on his intentions — and 
guide the subsequent bug analysis or even bug correction. However, this additional, 
programmer-supplied, information goes beyond the corrected program — and 
approaches a bug description based on the programmer « original plan. While we have 
some methodological reservations about using interviews collected after the fact, 8 the 
main issue is that information gotten from the interview is of a different sort than the 
information gotten from the corrected program where the former information is 
much more akin to the programming plans described above. 

• What is actually counted can be quite problematic. For example, if we correct the 
buggy program in Figure 7c by adding the missing ELSE), we also need to add a 
BEGIN-END block around the running total update and the counter update. Should 
we count this as 1 bug or 2 bugs? It seems unfair to count the BEGIN-END block 
against the programmer, since this change is required by the “rear change. On the 
other hand, however, in the next section we will show programs in which the “real" 
bug is a missing BEGIN-END block. Thus, it is not inconceivable that a programmer 
could add the ELSE in Figure 7c, but forget to put in the now necessary BEGIN-END 
block. What one counts is a tricky issue. 

The upshot of these two problems with categorising and counting bugs based on a corrected 
version of the program was suggested above: one is less confident of the origin, of the bugs, and 
thus is less confident about percentages of bugs with those origins. Depending on the particular 
corrected solution and the particular choice of counting scheme, one could paint a picture of a 


•The problems with using interview data has received significant attention in psychology. For example, Ericsson 
and Simon [4] have argued that one can reliably only use verbal information given by the subject as the subject it 
doing the tatk . Tb»y argue that such a concurrent verbal report is effectively an on-line dump from short-term 
memory. In contrast, a report after the fact could be a story about what the subject thought he was thinking, and 
that significant distortions can occur in this type of situation. While one might arguably feel that the Ericsson and 
Simon position is a bit extreme, nonetheless, it seems only prudent to exercise care in interpreting interview data. 

E. Soloway 

Yale 

10 of 18 


program that contained many more coding lerel error*, ujr, than specification-baaed errors. The 
worst part of this situation is that we would not have n good way of knowing how right or wrong 
this analysis was — since we don't know how the bug categories and counts would hare turned 
out if a different corrected version were used as the basis for difference descriptions. 

PROGRAM DESCRIPTION LANGUAGE (PDL) 

PDL’s come in all flavors; some are very close to the code, while others are more high level, 
and closer to the plan level description. The former PDL would suffer from the same problems as 
using a corrected version as the standard. The latter type of PDL would suffer from the problems 
associated with using the programming plans as the standard. 


4 . An Extended Example 

Let us now consider an actual example from the on-line protocol data. In Figure 8 we depict 
the problem the students were trying to solve; in Figure 0 the program on the left is a buggy 
program generated by a student in our study. If we take a “local view” of the bugs in this 
program, we can generate a corrected version as shown in Figure 0 (right hand side). Notice that 
if we do a difference description between the corrected and the buggy versions we can come up 
with 8 changes: 

• The rainyday counter, COUNTl, will be always be updated; in order to correct for 
the times when a negative rainfall is input, we need to decrement COUNTl. Thus, [l] 
added a begin -end block after (NUM < 0) test, and [9] added a decrement of the 
rainyday counter. 

• COUNT2 must be made to contain the number of rainy (not just valid) days. 
COUNT2 keeps track of the non-rainy valid days in the loop. Thus, we need to 
subtract the non-rainy days (COUNT2) from the total valid days (COUNTl) in order 
to get the number of rainy days: [8] changed addition of COUNTl and COUNTS to 
subtraction of COUNTS from COUNTl. 

• The guard on the average calculation is incorrect. Thus, [4] changed guard on average 
calculation to COUNTl. 

• The divisor in the average calculation should be the valid day counter, COUNTl, not 
the valid, but non-rainy day counter, COUNT2. Thus, [6] ehanged COUNTS to 
COUNTl in the divisor of the average calculation. 

• If there is no valid input the program should neither calculate the average, nor should 
the program print it out — as well as not printing out the maximum. Thus, [0] added 
a It gin-end block after division guard around average calculation and output 
statements. 

• The WRITELNs give a message about what should be output; in order to make the 
message agree with the actual output, the variables need to be changed: [7] the valid 
day counter needs to be COUNTl, while the (3) rainy day counter needs to COUNTS. 

Given the number of changes that need to be made to the counters (COUNTl and COUNT2), it 
would appear that the student has some confusion over the roles of the two counters. 

E. Soloway 

Yale 

11 of 18 



The Noah Problem: Not! utdi to keep track of the raiafall in tk« Ntw Hivn arc* to determine 
when to launch hia ark. Writ* a program which ha cat um to do thia. Your program thou Id read 
the rainfall for each day, stopping when Noah types *00090”, which is not a data value, but a 
sentinel indicating the ead of input. If the user types in a negative valve the program should 
reject it, since negative raiafall far not possible. Your program should print out the number of 
valid days typed in, the number of rainy days, the average rainfall per day over the period, and 
the maximum amount of raiafall that fell on any one day. 

Figurn St The Noah Problem: A First Looping Problem 


E. Soloway 
Yale 

12 of 18 



ORIGINAL PAGE IS 

OF POOR QUALITY 


BUGGY EXAMP1B 

•COIN 

VRITELN ('PLEASE I INPUT AMOUNT OF AAINFAU'). 

AEADIN. 

AEAO(NUM). 

COUNT 1 • 0. 

COUNTS » 0, 

SUM * 0. 

MIQHNUN ■ 0. 

WHILE (NUN <> SENT INAL) 00 
KGIN 

IF (NUN > 0) 

THEN 

SUM • SUM ♦ NUN. 

C0UNT1 » COUNT 1 ♦ 1 
IF (NUN > HIGHNUN) 

THEN 

HIGHNUN ■ NUN. 

IF (NUN • 0) 

THEN 

COUNTS * COUNTS ♦ 1. 

IF (NUN < 0) 

THEN 

VRITELN O "'.LEGAL INPUT. INPUT NEV VALUE'). 

AEADLN. 

READ(NUM) . 

ENO. 

COUNTS • COUNTS ♦ C0UNT1 . 

IF (NUN > ?) 

THEN 

TOTAL ■ SUN/COUNTS. 

VRITELN ('AVERAGE RAINFALL HAS '.TOTAL.' INCHES PER OAT'). 
VRITELN (‘HIGHEST RAINFALL NAS '.HIGHNUN.' INCHES'). 
VRITELN (COUNTS.' VALID OATS HERE ENTERED'). 

VRITELN ( COUNT). ’ RAINY DAYS IN THIS PERIOD '). 

END 


OOBMBOTED VCSDON 

•com 

VRITELN ('PL EASE I INPUT ANOINT OF RAINFALL'). 
RCAOLN. 

RIAO(NUH) , 

COUNT) ■ 0. 

COUNTS ■ 0. 

SUN ■ 0. 

HIOHNUN * 0. 

VHILE (NUN <> SttTINAl) DO 
•COIN 

.F (NUN > 0) 

T(HN 

SUN :■ SUN ♦ NUN. 

COUNT) :■ COUNT) ♦ ). 

IF (NUN > HIOHNUN) 

THEN 


HIQWAJH • NUN. 

IF (NUN • 0) 

THW 

COUNTS ■ COUNTS • I. 

IF (NUN < 0) 

THEN 

NfN (*»44 IMi law *) 

nmiII ;■ aanll • 1; IMi Um *) 
VRITELN ( ' ILLEGAL INPUT. INPUT FCV VALUE'). 
mi; (*atUlki» Urn* •) 

RCAOLN. 

RCAD(NUN); 


END. 

mull :m «amf J • mall; (*tkmt*i Mia Mw *) 

IF (amnll > 0 ) Am *) 

THEN 

•at I* add Mia Mm •) 

TOTAL ■ SUN/aaawll; Im. *) 

VRITELN ( ‘AVERAGE RAINFALL HAS '.TOTAL.' INCHES PER OAT'). 
VRITELN ('HIGHEST RAINFALL NAS '.HIOHNUN.' INCHES'). 
mi; (* add (Aia Mm *) 

NRITELN(«amili. ' VALID DA IS HERE ENTERED’) ; (• tkmfti FA la Am •) 

VPITELN(mmIS. ' RAINY DAIS IN THIS PERIOO ') . (•tkmmilM* Am 

mo 


V 




addtd a bagm-tnd block aftar (NUN < 0) ta«t, and ’Jj iddad a dacraaaat of tba rtinyday coanttr 

cbangad addition of C0UNT1 aad C0UNT2 to sabtractloa of C0UNT2 fro* C0UNT1 

changad guard on aaaraga calcalation to C0UNT1 

changad C0UNT2 to C0UNT1 la tha divisor of tba aaaraga calcvlatioc 

iddad a begia-ead block aftar division gaard around aaaraga calcalation and ovtpvt stiteaants 
tba valid day coantar n»ads to ba COUNT!, wbila tba |t| rainy day eoantar naad* to C0UNT2 


Figure 9s A Bu*gjr Program tmd a Corrected Version 


E. Soloway 
Yale 

13 of 18 


However, consider now a different corrected version of this buggy program as depicted in 
Figure 10. A difference description between the buggy version and the corrected version yields the 
following set of bugs: 

• We cac make COUNT1 only keep track of the rainy days; this is consistent with code 
already in the program: the line that adds COUNT2 and COUNT1 now makes sense 
— COUNT2 now keeps track of the valid days, and the divisor in the average 
calculation suggests that COUNT2 should be the valid day counter. In order to make 
COUNT1 perform in this manner, we need to [Ijj add a bigin-end pair around all 
computation after NUM > 0 teet, up to the NUM •«» 0 teat. 

• If there is no valid input the program should neither calculate the average, nor nhould 
the program print it out — as well as not printing out the maximum. Thus, we need 
to [2] add a begin-end block after division guard around average calculation and 
output statements. 

• The guard on the average calculation is incorrect. Thus, [3] changed guard on average 
calculation to COUNTl. 

Which description should we choose! And why? Notice that neither of the corrected versions 
were that unreasonable. However, it would seem to us that one should choose the second bug 
description over the first. The basis for that decision is the hypothesised plan structure 
underlying the buggy version: it appears to us that the student was trying to structure the 
actions in the main loop in terms of cases. For example, the program explicitly tested for NUM 
> 0, NUM — 0, and NUM < 0 and took the appropriate actions — almost. In order to make 
the case structure work, the code following the NUM > 0 up to the NUM ■» 0 test should be 
grouped together. While one cannot put ton much faith in the indentation of a novice’s 
program, 7 it appears that the indentation supports this analysis. Thus, what b missing from the 
main loop is a begin-end pair surrounding the code between the NUM > 0 test and the NUM ** 
0 test. On thta analysis, the student does not have a misunderstanding surrounding the two 
counters, but rather has a coding level misunderstanding about how to block code together. 
Moreover, this same misunderstanding can explain the lack of a begin-end pair surrounding the 
average calculation in the next two write statements. The reduced bug count in the second 
description follows directly from this analysis: in effect there are nly 3 bugs in this program, 2 
of which have the same underlying origin. 

This example illustrates a point made earlier: the bug categorization and bug count follow 
from an understanding of the program that is provided by the hypothesized plan structure of 
the program. That is, to understand a buggy program, one must make inferences about what 
plan structure the programmer intended to implement; the program only "makes sense” in terms 
of these plan descriptions. 


7 We have observed in the on-line protocols that the physical layout ri a student’s program suffers as the student 
makes changes to his program in the process of debugging it. E. Soloway 

Yale 

14 of 18 


ORIGINAL PAGE IS 
OF POOR QUALITY 


BUGGY EXAMPLE 

BEGIN 

WRITELN ('PLEASE I INPUT AMOUNT OF RAINFALL'). 

READLN, 

READ (NUN). 

COUNT 1 * 0. 

C0UNT2 * 0. 

SUN ■ 0. 

HIGHNUM * 0, 

WHILE (NUN SENTINAL) DO 
BEGIN 

IF (NUN > 0) 

THEN 

SUM * SUN ♦ NUN. 

COUNTI * COUNT 1 ♦ 1. 

IF (NUN > HIGHNUM) 

THEN 

HIGHNUH = NUN. 

IF (NUN * 0) 

THEN 

C0UNT2 = COUNT 2 ♦ l, 

IF, (NUN < 0) 

THEN 

WRITELN ('ILLEGAL INPUT. INPUT NEW VALUE'). 

READLN. 

READ(NUM) . 

END. 

CQUNT2 * C0UNT2 » COUNTI . 

IF (NUN > 0) 

THEN 

TOTAL = SUH/C0UNT2. 

WRITELN ('AVERAGE RAINFALL WAS '.TOTAL,* INCHES PER DAT'). 
WRITELN ('HIGHEST RAINFALL WAS '.HIGHNUM.' INCHES'). 
WRITELN (C0UNT2. ' VALID DAYS WERE ENTERED'). 

WRITELN (COUNTI.' RAINY DAYS IN THIS PERIOD '). 

END 


ANOTHER CORRECTED VERSION 

BEGIN 

WRITELN ('PLEASE! INPUT AMOUNT OF RAINFALL'), 

READLN. 

READ(NUN). 

COUNTI • 0. 

COUNT 2 * 0, 

SUN * 0. 

HIGHNUH * 0, 

WHILE (NUN <> SENTINAL) DO 
BEGIN 

IF (NUN > 0) 

THEN 

htfin (* *44 Mil Em *) 

SUN * SUN ♦ NUN. 

COUNTI * COUNTI ♦ 1. 

IF {Vm > HIGHNUM) 

THEN 

HIGHNUH * NUN, 

mi; (* *44 I Hit lint *) 

IF (NUN * 0) 

THEN 

C0UNT2 * C0UNT2 ♦ 1 . 

IF (NUN < 0) 

THEN 

WRITELN ('ILLEGAL INPUT. INPUT NEW VALUE'). 

READLN. 

REAP (NUN) . 

END, 

COUNT 2 * C0UNT2 ♦ COUNTI, 

IF (Mull > 0) (* chtnfti Ihil lint •) 

THEN 

itftn (**44 Ihil lint *) 

TOTAL * SUH/COUNT2 

WRITELN ('AVERAGE RAINFALL WAS '.TOTAL.' INCHES PER DAY'), 
WRITELN ('HIGHEST RAINFALL WAS ’.HIGHNUH,' INCHES'). 
tni; (**44 Ihit lint *) 

WRITELN (C0UNT2, ‘ VALID DAYS WERE ENTERED'). 

WRITELN (COUNTI,' RAINY DAYS IN THIS PERIOD '), 

END 


• 111 add a begm-end pair arround all computation after NUN » 0 test, up to the NUM ■ 0 test 

• [2] add a begm-end block after division guard around average calculation and output statements 

• 1*1 changed guard on average calculation to COUNTI 

Figure 10: A Bugggv Program an an Alternative Corrected Version 


E. Solo way 
Yale 

15 of 18 


6. Concluding Remarks 

We have argued that a bug description is a difference description between the realisation and 
the intention specification. We have presented a number of techniques for specifying the intention 
and have pointed out the problems associated with each type of specification in developing an 
accurate picture of bug types and bug frequency. While no technique is without its problems, we 
have argued that the understanding provided by a plan analysis of the buggy program stands a 
better chance, as compared to the other techniques, of providing a more accurate categorisation 
and count of the bugs — and thus a more accurate reflection of the origins of the bugs. 


E. Soloway 
Yale 

16 of 18 


References 


1. Basili, V., Perricone, B. Software Erron and Complexity: An Empirical Investigation. Tech. 
Rept. TR-1195, University of Maryland, Dept, of Computer Science, 1982. 

2. Bonar, J. Understanding the Novice Programmer. Dissertation, in preparation. 

2. Ehrlich, K., Soloway, E. An Empirical Investigation of the Tacit Plan Knowledge in 
Programming, in Human Factors in Computer Systems , J. Thomas and M.L. Schneider (Eds.), 
Ablex Inc., in press. 

4. Ericsson, A. and Simon, H. "Verbal reports as data." Psychological Review 57(1980), 
215-251. 

6. Johnson, L., Draper, S., Soloway, E. The Nature of Bugs in Novices' Pascal Programs, in 
preparation 

0. Miller, L. A. "Natural Language Programming: Styles, Strategies, and Contrasts." IBM 
Systems Journal 55(1981), 184-215. 

7. Ostrand, T., Weyuker, E. Collecting and Categorising Software Error Data in an Industrial 
Environment. Tech. Rept. 47, New York University, Dept, of Coimputer Science, 1982. 

8. Rich, C. Inspection Methods in Programming. Tech. Rept. AI-TR-004, MIT AI Lab, 1981. 

9. Soloway, E., Ehrlich, K., Bonar, J., Greenspan, J. What Do Novices Know About 
Programming? In A. Badre, B. Shneiderman, Ed., Directions in Human-Computer Interactions, 
Ablex, Inc., 1982. 

10. Soloway, E., Bonar, J., Ehrlich, K. . Cognitive Strategies and Looping Constructs: An 
Empirical Study. Communications of the ACM, in press. 

11. Soloway, E., Rubin, E., Woolf, B., Bonar, J., Johnson, L. MENO-II: An Intelligent 
Programming Tutor. Journal of Computer-Based Instruction, to appear. 

12. Weiss, D. Evaluating Software Development By Analysis of Change Data. Tech. Rept. 
TR-1120, University of Maryland, Dept, of Computer Science, 1981. 


E. Soloway 
Yale 

17 of 18 



THE VIEWGRAPH MATERIALS 
for the 

W. JOHNSON/S. DRAPER/E. SOLOWAY PRESENTATION 
WERE INCORPORATED IN THE PAPER 


E. Soloway 
Yale 

18 of 18 



ERROR TAXONOMY 


3/o 

, N83 32366 


WHAT CAN BE GAINED? 


by 


D. E. Buck land 
Rtiftr Consul tants, Inc. 

Dac amber 1, 1982 


D. Buckland 
Reifcr Cons. 
1 of 28 



ORIGINAL PAGE IS 
OF POOR QUALITY 


System development has been a .id continues to be an evolut ionary 
process. Technology i« rapidly catching up with the science 
fiction writers of yesterday. Me see some form of computer in 
just about all of our equipment, including cars, watches, 
cameras, home appliances, weapons systems, communications 
devices, space ships, etc. In the good old days, hardware did it 
all. Today more and more capabilities are being fashioned by 
some form of software, and computers are becoming smaller, more 
powerful and far more complex. Me've recognized that, no matter 
what our task is, experience is our best teacher. In the field of 
system development we'd like to profit not only from our own 
experiences, but also from the experiences of our fellow computer 
scientists,. 

In order to this, we need a history of what we've done. Me can 
accomplish this by implementing some of the formal procedures and 
documentation requirements from the older, hardware side of the 
house. In order to profit from our mistakes, we need to keep 
track of what want wrong, and what was done to correct each 
situation. One technique used to accomplish this is to implement 
an error taxonomy. 

Exactly what is an error taxonomy? Simply stated, it is the 

classification and quantification of errors. Numerous studies 
have been conducted in an attempt to provide quantitative data on 
errors that occurred in relatively large systems. The study of 
errors is important for the following reasonsi 

o A major item impacting costs, risks and uncertainty in 
system development is the lack of knowledge of what causes 
errors, why they occur and how they can be reduced <or at 
least located more quickly). The development of error data 
bases for systems is a step towards the statistical 
quantification of error occurrence. Once error occcurrences 
can be quantified, steps can be taken to reduce them. 

o Identification of relationships between error occurrences, 
causes, criticality and time of error occurrence can lead to 
improved methods of detecting errors before they become 
difficult and costly to correct. 

o Reliable error data can be used to measure the impact (both 
positive and negative) of modern software development and 
validation methodologies and tools on quality and 

product ivi ty . 

o The formal error documentation process forced by error data 
collection itself can provide better error control and help 
assure appropriate corrective actions are taken. 

Errors can be categorized in a r.uinoer of ways. The key is to 
define categories that are useful and applicable to the 

application. The more common categories are: 

D. Buckland 

Reifer Cons. 

2 of 28 



ORIGINAL PAGE IS 
OF POOR QUALITY 


o Time of occurrence 
o Level of criticality 
o Error type 
o Time of introduction 

The main reaaon for reporting problems is so that each problem 
can be resolved in a timely fashion. During system development 
and subsequent use, problems are found and reported regularly. 
If a formal reporting process is not used, even in a one man job, 
some problems fall by the wayside, and linger to make themselves 
known at some inconvenient time in the future. Programmer X 
discovers a problem in programmer Y's code, and with full 
intentions of telling him (or her) about i t as soon as his test 
time is finished, becomes involved in another problem, or runs 
off to a meeting, and forgets. Or how many times have we heard 
"Such and such doesn't work correctly" with no indication of what 
was being done or what was expected? Much time and effort must 
then be expended to investigation prior to resolution. 

In order to identify and solve problems in a timely fashion, a 
clean, simple problem reporting mechanism is required. Using 
such a mechanism, problem status reports can be produced that 
enable management and staff alike to evaluate what is left to be 
done, assign priorities so that the more painful items are taken 
care of first and group similar problems together for expeditious 
handling. Mhen problem reports are up to date, test coverage can 
be maximized by staying clear of known problem areas, 
concentrating on new territory and reducing duplication. Mhen 
thorough problem reports are required, test objectivity increases 
because test conditions must be substantiated. The problem 
report itself serves as a form of communication between reporter 
and resolver, and problem turnaround increases. A careful 
analysis of problem status reports can identify weak areas, spot 
trends and enable the application of past experiences in the 
future. 

The reporting mechanism must include the filling out and 
gathering of problem reports, enable expedient investigation, 
archive the resolution and enable problem evaluation. All of 
this should be accomplished with a minimum of clerical time. A 

key point to remember is to gather enough data at the time so 
that information you may need in the future is readily available. 

Mhen implementing a problem reporting system, several factors 
need to be considered beforehand. The first is to define a 
common set of terms so that all involved with the system are 
speaking the same language. Establish and publish a list of 
keywords, acronyms and abbreviations. Next one should design a 
problem reporting form. This should be kept to one page and 
should make use of checkboxes where practical. Plenty of space 
should be provided for both the problem symptom and the 
resolution. Allow for problems to be reported against a 
baseline, with all deviations from the baseline noted (patches, 
etc.). One central point of control should be maintained, where 
new problems can be logged open, and resolved problems closed. 

D. Buckiand 

Ki ifer Cons. 



ORIGINAL PAGE IS 

OF POOR QUALITY 

This may be as simple at a notebook or as complex as an automat ad 
systsm. Of prims importancs is to assurs that ths system is 
flexible and growth orisntsd. It is much sasisr to gathsr data 
in rsal tims than to acquirs it from ths msmoriss of thoss 
involved whsn ths projsct is completed. 

Information on problsms is usually collsctsd in ssrial fashion. 
Mhsn a problsm is discovered, ths following is nssdsdi 

o Who found ths problsm? Should a qusstion ariss as to ths 
naturs of ths bug* facts not includsd in ths rsport itself, 
i ntsrprstat ion of ths tsst f rscrsation of ths problsm, stc . , 
i t wi 1 1 bs nscsssary to spsak with ths rsportsr. 

o Whsn was this problsm found? Rscording this dats snablss 
ths analyst to arrive at such facts as what phass of ths 
lifs eye Is this occurrsd in, how long ths problsm has bssn 
opsn and how long it took to resolve, and also to track how 
many problsms wsrs opsnsd during given phasss. 

o What happsnsd? Ths rsportsr should dstail ths exact 
symptoms whsnsvsr possibls. This includss, but is not 
limited to, ths systsm idsntif ication, hardwars and softwars 
configurations, tsst cass, inputs, test programs, expected 
outputs or rsactions, stc. Thsrs should bs snough dstail to 
enabls ths programmsr to rsersats or pinpoint ths problsm. 
Rsmsmbsr , it is sntirsly possibls that ons problsm can havs 
several symptoms. 

o What was bsing ussd? Ths systsm ths problsm occurrsd on, 
along with any tsst squipmsnt should bs idsntif isd. This 
will snabls ths programmsr to dstsrmins whsthsr ths problsm 
is configuration dspsndsnt, or possibly caussd by a harefcuars 
f ai 1 urs . 

o Is this a rsoccurrsncs of a previously closed problsm? This 
would indicate that a problsm may havs occurrsd in 
configuration management, or all of ths causes had not yet 
bssn discovered. 

o What is ths level of criticality? Ths category must take 
into consideration whsthsr or not ths problem itself is 
mission critical, prevents further checkout of mission 
critical areas of ths systsm, will involve a lot of rework 
and impact schedule, is cosmetic in naturs, stc. Ths level 
of criticality is not always evident whsn ths problem is 
originally reported, but may change as investigation reveals 
the mitigating conditions. 

Whsn a problem is resolved, ths appropriate historical 
information should bs recorded. Analysts will need to knows 

o Why did it fail? The clinical reasons for ths failure must 
be recorded. The modules and interfaces involved should bs 
noted. The exact cause should be given, whether it was an 

D. Buckland 

Reifer Cons. 

4 of 28 



ORIGINAL PAG!; IS 
OF POOR QUALITY 


•rror or oversight in the requirement*! a design failure, 
coding error, test error, human operation fault, etc. This 
information will allow the analyst to identify error trends 
and weak areas, and suggest recovery actions. 

o What was the solution? Exactly what was done to resolve the 
problem? This might be to correct a piece of documentation, 
revise the code, or even do nothing at all. Depending on 
when a problem is found, it is sometimes more costly and 
more risky to fix it than to work around it. 

o Who supplied the resolution? Should questions arise in the 
future, this is the person to whom they will be directed. 

o When was it closed? The presence of this date indicates 
that the problem is not active, and wilt not be included in 
the "current open" count. It also enables time information 
to be extracted. 

During the time that a problem is open, it may prove helpful to 
give it a status, such as new, patched, reported fixed on a 
certain baseline, retry, recreate, revised, etc. These can 
indicate to those using the report actions that need to be taken 
to close the problem. For instance, a problem that is 
categorized as critical, but has not been reproducible, would 
carry a recreate status to indicate that the programmer wishes to 
be informed immediately when the problem re-occurs. Or a problem 
reported as fixed on a given baseline should be validated prior 
to its being officially closed. 

Now that we have the ability to collect all of this fine data, 
what can it tell us? By way of example, let me share with you a 
study that was conducted by Reifer Consultants, Inc. of errors 
reported during the development and use of the Deep Space 
Network/3 in preparation for the development of the Deep Space 
Network/4. 

The problem reports for this program were initially meant to 
indicate to the programmers that a problem existed, and not much 
more. In preparation for this study, a team of analysts 
evaluated existing taxonomies, and with a little embellishment, 
developed a taxonomy applicable to this JPL project. A three 
dimensional classification scheme was devised to capture 

meaningful error data in a manner suitable for additional 
statistical and trend analysis. Each of the dimensions is 
summarized belowt 

o T jam oi, nacLucL^mntLm - Defines in which of the four DSN phases 
of the software life cycle the error occurred. The four 
times weret Development, Verification, Acceptance or 
Transfer . 

o Ccilicalily - Defined in which level of severity the error 
could be categorized. The three level* of severity weret 
Critical, Dangerous and Minor. D. Buckland 

Rcifcr Cons. 

5 of 28 



ORIGINAL PAGE fS 
or POOR QUALITY 


o Ciltyay - categorized the cause of the trror. The ton 
orror typ^t werei Computation, Logic, Data handling, 
Intorfaco, Data baao. Operation, Roquiromonta incorrect, 
Design, Clorical and other. 

(Because it ia important to precisely define terminology, I have 
encloaed a detailed deacription of the taxonomy aa an appendix to 
thia paper.) 

The aame team of analyses then analyzed approximately Hit 
problem reports, ar:*1 interviewed people involved with the 
project! in an attempt to fill in the blanka. Uaing the DSN/RCI 
software error taxonomy, each problem report waa categorized in 
terma of ita category, criticality and time of occurrence. 

A preliminary analyaia of the resulting data base waa performed. 
Summaries of the data were compiled and evaluated so that 
recommendations for improvement could be formulated. Hiitograms 
were used to identify apparent trends and conclusions without 
resorting to a detailed statistical analysis. The histograms 
combine error data within accuracy range of plus or minus 1'/.. 
Three histogr^s fellow along with a discussion of the 
observations. To simplify the graphs, the common abbreviations 
listed in Table i were used. 


Table 1 

ABBREV I AT I ONS/ACRONYHS 


o Time of Occurrence 

0 ~ Development - design, coding and unit teat 
V - Verification - integration and testing of subsystem 
A - Acceptance - Formal testing and acceptance of subsystem 
T - Transfered - software subsystem operational 
U - Unknown 

o Criticality Levels 

A - Cri tical 
B - Dangerous 

o Error Category 

CO - Computational Error 
LO - Logic Error 
DH - Data Handling Error 
IN - Interface Error 
DB - Data Base Errnr 

D. Buckland 
Rcifer Cons. 
6 of 28 


C - Minor 
U - Unknown 


GP - Operation Error 

RI - Requirements Incorrect 

DE - Design Error 

CL - Clerical Error 

OT - Other 



OHIGINAL P"|PL is 
OF POOR QUALITY 


A histogram illustrating arrors by tima of occurranca (Figura 1> 
warn producad. Tha undafii.^d tima occurrancaa rasultad from 
problam raports which had no tima of occurranca and for which no 
tima of occurranca could ba aacartainad. Tha obsarvations wa can 
maka basad on this histogram ara as followsi 


o Tha data sasms to indicata that formal problam raporting 
procaduras war a not strictly anf oread during tha davalopmant 
of most of tha subsystams invasttgatad by this study. 

o Tha softwara var ification and accaptanca tasting procassas 
uncovarad a 1 arga numbar of arrors. Unf or tunatal y, thara 
wara still many mora arrors not discovarad until tha 
subsystsm was ptacad in oparation. 

Tha naxt histogram (Figura 2> illustratas arrors by criticality 
I aval for aach of tha thraa criticality indicas. An additional U 
classification was includad to idantify anomalias for which no 
criticality lava! could ba ascartainad. Tha obsarvations wa can 
maka basad upon this histogram ara as followsi 

o Lava! B arrors wara in tha majority. Although work arounds 
could ba davilsad v such a larga numbar of arrors makas 
axisting quality assuranca practicas suspact. 

o A targa numbar of I aval A arrors wara idantifiad. Critical 
arrors of such a larga proportion immadiataly call attantion 
to raviaw procaduras and tasting approuchas usad during 
daval opmant . 

Tha naxt histogram (Figura 3> illustratas criticality laval by 
arror catagory. An additional classification, a quast ionabl a" , 
consists of "cthar* problam raports for which no changa was 
ganaratad, Tfri-vr A qu«* Ucn$ola* arrors wara tha subsat of 
■othar* vrf'ors iiich rasultad from documantation raquasts, 
gripas, mi sundar standings, politics and potantial hardwara 

failuras. Tha obsarvations wa can maka basad on this histogram 
ara as fol I owsi 

o Oasign arrors saamad to causa a larga numbar of critical 
arrors. This providad us with furthar avidanca of tha naad 
to invastigata aarliar dataction of dasign arrors. 

o Data handling arrors wara also a causa of a 1 arga numbar of 
critical arrors. 

o Surpr i singl y , "othar* arrors contributad a larga numbar of 
critical arrors. This could ba attributad to tha usar who 
could not oparata or undarstand oparational anomalias and 
catagorizad tham as critical to gat immadiata attantion. 
This data m'iphvsizad tha naad to ravamp tha axisting problam 
raporting procadura and to invastigata ways of improving tha 
man/machina intarfaca. 13 Buckland 

Reifer Cons. 

7 of 28 



FIGURE 1 

ERRORS BY TIME OF OCCURRENCE 


ORIGINAL PA 

OF POOR QU 



399 


285 






254 






61 

1 







O.T"|v lU p/Kv f 
°‘ » Ql!V (TY 

FIGURE 2 

ERRORS BV CRITICALITY 





NUMBER OF ERRORS 



D. Buckland 
Rcifer Cons. 
10 of 28 






OR!G!TJA!. PACK IS 

OF POOR QUALITY 

o Design and requirements trrort were the largttt tinglt 
tourc* of problems. 

o Some trrort of the "questionable" tubcatvgory of "other" 
were not errors but really requests for changes or 
document at ion . This seemed to indicate the nred to improve 

existing problem reporting procedures and the mechanisms 
used for quality control. 

The major findings of this study can be summarized as follows! 

o Software error data is an important management tool because 
it indicates where problems exist and where management 
attention should be placed. For future projects, the 
classification of error data should be performed as 

anomalies are reported. This would help assure that the 
error was more fully understood as i t was reported. It 
could also be used to identify error-prone modules and 
provide information upon which repair or replace decisions 
could be based. 

o Analysis of the OSh software error data base indicated that 
many of the critical errors occurred during the requirements 
definition and design phases. These errors are the most 
costly to correct, especially if they are not caught early 
in the development cycle. 

o Many of the "other* error types could be attributed to 
poorly defined man/machine interfaces (e.g., commands that 
are difficult to use or whose incorrect usage causes the 
system to halt), improper and imprecise procedures for 
handling exceptions, inadequate documentation and/or user 
misconceptions (requested for enhancements/modifications 
that were not real 'y problems at all). 

ACKNOWLEDGEMENT 

Portions of this paper are based upon work performed by Reifer 
Consultants, Inc. under Contract LI-726929 to the Jet Propulsion 
Laboratory, California Institute of Technology. It utilizes Deep 
Space Network anomaly data compiled by Ms. Connie Johnsen and 
analyzed by SoHaR, Inc. under subcontract to RCI . Many peop'e 
supported our efforts and all of their contributions are 
acknowledged. Special thanks are extended to the DACS at Rome 
Air Development Center who has agreed to distribute the error 
data base free to interested parties. 


D. Auckland 
Reifer Cons. 
1 1 .f 28 



original PAGE f«? 

<* POOP QUALITY 


Appendix 

Software Error Taxonomy Definitions 


Time of Error Occurrence 


Four time classifiers were chosen because they were compatible 
with the DSN anomaly report data provided as Input. The classifiers 
are as follows: 

(D) Development - Anomalies In this category were reported 
during the design, coding and module unit testing act- 
ivities. Most required design or programming revisions 
to be made. Errors in the category typically dealt with 
design problems between modules or with functional 
limitations of design. An example follows: 

"A system was required to provide human readable 
error messages on a log device. Unfortunately, 
the function was not specified in either the re- 
quirements and design specification. The error 
was discovered during a design review and an 
anomaly report was opened. Under such circum- 
stances, we would state that the anomaly had 
occurred during development." 

(V) V erification - Anomalies in this category were reported 
during integration and testing activities. Host were 
specification deviations that required the code to be 
revised. An example follows: 

"Module X expects a true or false condition as 
input from module Y. Unfortunately, module Y has 
not been specified to provide the true or false 
input. A test identified this problem during 
testing and an anomaly report was written scoping 
the rework. Under such circumstances we would 
state that the anomaly had occurred during ver- 
ification." 

(A) Acceptance - Anomalies In this category were reported 
during formal testing of the software. Errors in this 
category usually stem from requirements problems or im- 
proper mechanization. An example follows: 

"The system malfunctions when accepting more than 
six simultaneous inputs. The error was discovered 
during formal testing when the program was stressed 
and an anomaly report was written. Under such 
circumstances, we would state that the anomaly had 
occurred during acceptance." 


D. Buckland 
Reifcr Cons. 
12 of 28 



cr:c»nai. p.u,c ij 

OF POOR QUALITY 


(T) Transfer - Anomalies In this category were reported after 
the software package was put Into operation In a live 
environment. These anomalies usually resulted from halts* 
failures or malfunctions. An example of such an anomaly 
follows: 

"The software halfts when a zero Input value Is re- 
ceived. This error was discovered during operation 
when the DSN was reducing telemetry data. Under such 
circumstances, we would state that the anomaly occurred 
during transfer." 

Error Criticality 

The three error criticality classifiers used are defined as follows: 

• level A - Critical error (error impacts mission performance 
or seriously degrades capability and no workaround exists). 

An example follows: 

"The system halts when the value of one of its inputs 
exceeds its nominal end of range. Manual intervention 
is required before operation can be resumed. Under such 
circumstances, we would state that a level A error had 
occurred." 

• Level B - Dangerous situation (error exists that could degrade 
performance or capability hut a workaround exists). An example 
follows: 


"A particular utility function causes the system to halt 
to await operator's action. The utility function Is not 
required for correct system operation and can be 
disabled temporarily to correct the problem. Under 
such circumstances, we would state that a level B error 
had occurred." 

• Level C - Minor problem (error exists that doesn't impact 

performance or capabilities and can be fixed at a more leisurely 
pace. An example follows: 

"An informational message is displayed twice (rather 
than once) each time it is enabled. No other 
negative effect happens. Under such circumstances, 
we would st; ce that a level C error had occurred." 

Error Category 

The third dimension of the DSN/RC I error taxonomy is error category. 
Each of the ten error categories was defined so that insight into the error 
causes could be ascertained. The ten categories are defined as follows: 


D. Bucktaml 
Reifer Cons. 
13 of 28 



1. Computation - Computation anomalies are errors in or re- 

sulting from coded equations. Examples of computation 
errors include: (a) Incorrect operand in equation, (b) 

Incorrect use of parenthesis, ( c ) Incorrect equation, 

(d) Missing computations and (e) Rounding or truncation 
error. 

2. Logic - Logic anomalies are errors in sequencing, control 
or loop conditions. Examples of logic errors include: 

(a) Logic out of sequence, (b) Wrong variable being checked, 

(c) Missing logic or condition tests, (d) Too many/few 
statements in loop and (e) Loop iterated incorrect number 
of times. 

3. Data Handling - Data handling anomalies are errors in hand- 
ling input/output. Examples of data handling errors include: 

(a) Data initialization incorrect, (b) Variables not set 
properly, (c) Variable type incorrect, (d) Data packing/ 
unpacking incorrect and (e) Subscripting error. 

4. Interface - Interface anomalies are errors in communications 

between a routine and other routines, the data base and/or 
the user. Examples of interface errors include: (a) Data 

incorrectly transmitted from one routine to another, (b) 

Data incorrectly set/used from the data base, (c) Improper 
input/output synchronization and (d) Data sent to wrong 
destination. 

5. Data Base - Data base anomalies are errors in present data. 
Examples of data base errors include: (a) Data should have 
been initialized in data base but wasn't, (b) Data initialized 
to incorrect value and (c) Data base units are incorrect. 

6. Operation - An operation anomaly is an error occurring 

as the software executes. Examples of operation errors 
include: (a) Operating systems errors, (b) Hardware 

errors, (c) Operator errors, (d) Compiler or support soft- 
ware errors and (e) Test execution errors. 

7. Requirements Incorrect - Requirements errors deal with im- 
proper or ambiguous functional and software requirements 
specifications and not with implementation and/or 
operation. Software may correctly solve the wrong problem 
if it is specified improperly. 

8. Design - Design errors deal with improper archi tectural and 
detailed design specifications which form the basis to 
which the program and the data base are mechanized. 

9. Clerical - Clerical anomalies occur when people are involved 
in the translation. Examples of clerical errors include 
keypunch, typos and/or transl i teration. 

0. Other - Other is a "catch-all" for other types of errors not 
encompassed by the scheme. Examples of other errors include 
incorrectly reporting that an anomaly had occurred when in 
reality it was a programmer -i sconcept i on . D. Buckland 

Reifer Cons. 
14 of 28 



nil- Ml WCJRAl’H MA ll'RIAl S 
for the 

l> BUCK l AND PRI SI NI AI ION I '01 LOW 


I) Bucklaml 
Reiter Cons. 
15 of 28 



RROR TAXONOMY 


PsIH^T* CAN BE GAINED? 


by 

D. E. BucKIand 



Reifer Consultants, Inc. 



25550 Hawthorne Boulevard, Suite 208/Torrance, California 90505 


D. Buckland 
Reifer Cons. 
16 of 28 



D. Buckland 
Reifer Cons. 
17 of 28 



WHAT DOES I T DO? 


A* LI ERROR TAXONOMY 
TM^T ENABLES US 
LEARN FROM OUR 


IS TOOL 

TO BETTER 
LI I STAKES 


■0 o 

O 7J 

C 3. 

£ O 

C ^ 

—t _ 

< <4 








D. Buckiand 
Reiter Cons. 
18 of 28 



PROBLEM STATUS REPORTS 

o Aid In the Eva 1 u a t i on o-f W hit You 
H*ve Le-f t To D o 

o Gr ou p Simi 1 I terns Together For 

Expedi t i ou s Handl i ng 

o Ass i gn Pr i or i t i es 

o I ncrease : Test Cover Age 

Ob j ec t i v i t 
Commu n i c * t i on 
T urn around 

o Reduce T i me C>c Paperwor k 
o Learn F r orn Past E x per i ences 
o i den t i -f y W e a K Spots 
o Spo t Trends 

AM nu(tW< W t Mi U> l» d]ir«ii«Mi nif.i»i /»*# wft' • 


ORIGINAL PAGE IS 
OF POOR QUALITY 




D. Buckland 
Reifer Cons. 
19 of 28 



A CLEAN , 


S I MPLE 


PROBLEM REPORTING 


S fSTEM 


Pr ob 1 em 
Pr obi «-rr> 
Pr ob 1 em 
Pr ob 1 em 


Rtrpor t -s. 
I nves t i 
Peso I <_i t 
Ev a 1 u a t 



I ndi vi dual 
Basis 



Aim.;.- .. 4 ‘ t » i 'An .*. •»* 


ORIGINAL PAGE IS 
OF POOR QUALITY 


D. Buckland 
Reifer Cons. 
20 of 28 



o BUILD IN FLEXIBILITY 

o REPORT BY BASELINE 

o USE: CHECKBOXES WHERE possible 

o PLAN AHEAD 

o AUTOMATE 

o one central, poi nt of c ont rol 

o CONN ON TERNINOLOGY 




-Vlf nufiW< < It, k( I lt» im- ?«•».! n» t 


ORIGINAL PAGE IS 
OF POOR QUALITY 




D. Buckland 
Reifer Cons. 
21 of 28 



AT TIME: OR D I SCOE*ERY s 

o WHO FOUND IT? 

CD RIM ERI WA S IT ROUND? 
cd WHAT HAPPENED? 

CD WHAT WAS BE I MO USED? 

CD IS THIS A REOCCURRENCE OR A E E O S E D 

RROBLEM? 

CD WHAT IS THE LEE>EE OR CR I T I CAL I TT ? 


AS mdUtuh ir> R( I ?<k4 lo I* vufht*** [*** wn'.'r 


ORIGINAL w 

OF POOR QUALITY 




D. Buckland 
Reifer Cons. 
22 of 28 


LJ HAT WILL I NEED TO KNON? <cont) 



Reifer Consultants, Inc. 


AFTER 


THE PROBLEM 


HAtS BEEN CLOSED : 


o WHY 
o WHAT 
o WHO 
o WHEN 


DIO IT FAIL? 

THE SOLUTION? 
ASSIGNED THE RESOLUTION? 
WAS IT CLOSED?- 


AM nutcrit/* (tftyrftfrf liy Wl It* !»• u 7 w> utr?MN«f jVMif uuffiv* • ■ • 


ORIGINAL PAGE IS 
OF POOR QUALITY 


D. Buckland 
Reifer Cons. 
23 of 28 



<=> PROBLEM OCCURRENCE RATES 

o RESOLUT I ON RATES 

o INFORMATION BY CAT EGORY 

o Sy* t em 
o Subsystem 
o Modu 1 « 
o Cr i tical i ty 

o i n e 

o Pr obi em T y pe 


o 

o 



All lUilmili Iff I Nirf !«• !«• hiMxliKitJ i-iihuii f«if «*ii • 


ORIGINAL PAtiu. ,3 
OF POOR QUALITY 



D. Buckland 
Reifer Cons. 
24 of 28 



^FJ l_ v sis 


OF 7 ~ 1 ti 6 O 


3 OFTHA R E 


ERROR 3 


FROM OSN/3 


Study Conducted by RCI tor JPL in 1981 



•\tt nulrtuth In K( I »*A4 lu In- n]Mi«lmnf wihtmtl jun* wrrffi 


(OK'.GIUAL Pl<0£ 53 

OF POOR QUALITY 








D. Buckland 
Reifer Cons. 
26 of 28 



400 


UO 

cc 



300 


200 


100 


0 




nu'ttiil' |r» W I \W lu(» » nlxml f .»* wfpin • 


ORIGINAL PAGE IS 
OF POOR QUALITY 





Ml nutltfuls nqninihi In K< I \W tu I* npti*/t*rti ml In nil f*n tt wiHU’t i h«m»ii 


D. Buckland 
Reifer Cons. 
27 of 28 




D. Buckland 
Reifer Cons. 
28 of 28 



THE o l_lf— if IT* 1 FI CHT I Of i I.IF t FFuP I •— .t - 

IMPuPTmt JT t W I«i3Er 1EI -T “TOOL 


Hh 


r 

Ar e 


Lit-. 

a. t 

T o i_i 

C > i -zi R i 

Q * *i t 

1 ■ i h, 

=. t: 

r ou 

D i d L-4 r 

0 l“i Oj 


ORIGINAL PAGE IS 
OF POOR QUALITY 



rosi I'SriMAi'iON 


K Rone. IBM 
R. rausworthe. JIM 
R Britehcr J. (.Jaffney. IBM 



* 



N83 32367 


MAINTENANCE ESTTMATION METHODOLOGY 


BY 


KYLE Y. RONE 


INTERNATIONAL BUSINESS MACHINES CORPORATION 
FEDERAL SYSTEMS DIVISION 
HOUSTON. TEXAS 


K. Rone 
IBM 
1 of 28 



INTRODUCTION 


A* a project nears the end of its development phaaa and prepares to enter 
a maintenance phase, aeveral questions ar ;e for which there ere' no ready 
answers: 

o How many people are required to maintain the system? 

o What is the required critical skills level to support the 
project? 

o What is Che required staffing level to be responsive to 
customer needs? 

o How much of the staffing level can be used to perform new 
development work? 

The purpose of this paper Is to develop a rational, systematic approach 
to answering these questions. The approach selected uses a Rayleigh 
curve method of projection combined with a modified matrix method to 
forecast maintenance needs and required staffing levels. The curves 
generated by both methods are differenced to ascertain how much new work 
can be performed given the staffing line. Finally, actual project data 
is compared to the projection to validate or modify the process. 

K. Rone 
IBM 
2 of 28 



DETERMINING MAINTENANCE NEEDS 


In order to determine maintenance needs in the future, it is first 
necessary to examine the entire software development process. Studies by 
Peter Norden of IBM (Reference 1) have shown that research and development 
projects are composed of cycles. When these cycles are related to one 
another and added together, a curve results whi:h represents the entire 
project. Furthermore, these curves can be approximated by the Rayleigh 
curve forms given in Figure 1. Since software systems follow a life 
cycle process similar to other research and development projects, the 
Rayleigh curve method is selected for use in this methodology. 

To use this method, the foregone development phase is examined for actual 
manpower expenditures. A Rayleigh curve is then generated which 
approximates the curve of the expenditures during the development process. 
The resultant curve beyond the delivery point of the software system 
represents a projection of manpower needs during the maintenance process 
which is driven by the work expended during the development process. 


K. Rone 
IBM 
3 of 28 





DETERMINING A REASONABLE LEVEL OF SUPPORT 


The Rayleigh curve method, then, projects future work based on psst work. 
This method however Ik based on pure work required and does not addrsss 
other project needs ns critical skills and response to software system 
problems. Given that the development work stopB ct some point, then the 
curve will eventually go to zero, whereas, as long as software support is 
required, the project will continue to supply It. A method Is required, 
then, to determine a reasonable level of software development support to 
be provided to the customer at some steady state period In the future. 

To accomplish these goals, a study Is performed across the software 
project to determine functional elements and drivers for each project 
area. These functional elements and drivers arc then used to develop a 
matrix approach to estimating support levels for each project area. Each 
element is then quantified by software size, number of test cases 
required, or by development manpower level. These quantifiers are then 
transformed into maintenance levels for the element by use of the 
following general equation: 

Maintenance Level ■ ELEMENT SIZ1 

(Productivity) (Complexity Factor) (Level Factor) 

Where: Productivity • development or test productivity factor 

Complexity Factor ■ varies about .5 based on the complexity 

of the element 

Level Factor • 12 (length of development) 

K. Rone 
IBM 

S of 28 




The resultant maintenance levels are then tempered and modified baaed on 
judgments concerning critical skills and operations support and the 
totals are increased by a fixed percentage to cover management and 
support. An example of a matrix for a given area of software is depicted 
in Figure 2. All areas are aummarized for the project to determine the 
required support level (Figure 3). This generated level can be plotted 
with the Rayleigh curve as shown in Figure 4. The Rayleigh curve 
represents current effort required baseu on past effort. The optimal staffing level to be 
reached in steady state is represented by the support line. 


CiwOuVAi. r, I'J 
OF POOR q r . , ' 


K. Rone 
IBM 
6 of 28 


K. Rone 
IBM 
7 of 28 


SM 


FUNCTION 

STS-1 

SIZE 

STS- 2 
SIZE 

DEV. 

LEVEL 

MAINT. 

LEVEL 

CRITICAL 

SKILLS 

OPN 

SUPPORT 

TOTAL 

SUPPORT 

SM BASIC 

10875 

11792 

3.2 

1.4 



1.4 

SM/SP 

4657 

4709 

1.3 

.6 



.6 

SM-DISP 

CONT.PROC. 

9109 

12041 

* 3.2 

1.4 



1.4 

DOWNLIST 

10106 

10106 

2.7 

1.2 



1.2 

RMS 

45 

7574 

2.0 

1.0 

1.0 


2.0 

SM ROLL INS 

8658 

8658 

2.3 

1.1 



1.1 

SM,DL,ANNUN. 
PREP ROC. 

- 

- 

8 

1.5 

T3 


1.0 

TjS 

3.0 


FIGURE 2. EXAMPLE OF AREA MATRIX 


ORIGINAL PAGE (3 

OF P00!< QUALITY 



K. Rone 
IBM 
8 of 28 


MATRIX ESTIMATE SUMMARY 


AREA 

SIZE 

MAINT. 

LEVEL 

0PM 6 
SUPPORT 

MAS 

TOTAL 

AASD 

272918 FW 

42.0 

12.0 

10.0 

64.0 

AASD 

43318 FW 

31.9 

8.0 

10.1 

SO.O 

CON/QA 

- 

5.0 


1.0 

6.0 

SEC. SUPP. 


11.0 



11.0 

SDL 

87 5K S/L 

36.0 

4.0 

7.0 

47.0 

AS VO 

1247 TC 

82.5 

5.0 

15.5 

103.0 

SAS MAS 

- 



4.0 

4.0 



208.4 

29.0 

17.6 

285.0 


FIGURE 3. EXAMPLE OF MATRIX ESTIMATE SUMMARY 


ORIGINAL PAGE IS 
OF POOR QUALITY 



ORIGINAL PAGE T5 
OF POOR QUALITY 


MAN TOW KH AVAILABLE TO PERFORM NEW WORK 


The plot ot the Rayleigh curve and the support line can also be 
represented as two equations. By Integrating the difference between the 
two equations and evaluating over the time of Interest, the area between 
the curves is generated. This area represents the amount of manpower 
supported by the staffing level which is not committed to maintenance of 
past work, and hence, can be applied to new tasks (Figure 5). 


K. Rone 
IBM 

10 of 28 






ORIGINAL PAGE IS 
OF POOR QUALITY 


CONVERTING DIRECT ESTIMATES TO TOTAL PROJECT COSTS 


Using the manpower available to perform new work requires that direct 
work estimates be converted to project costs consistent with the project 
costs represented by the curves. To derive this relationship, examine 
the direct costs and overhead costs from actual data and calculate: 


PROJECT FACTOR - Total Project Cost 

Direct Estimate 


Using this factor, an estimate for a change or group of changes can be 
turned into a total project cost and used to "fill up" the area between 
the curves (Figure 6) until the project’s capacity to perform new work is 
exhausted. 


K. Rone 

IBM 

12 of 28 




LEVEL 



FIGURE 6. USING THE MANPOWER TO PERFORM NEW WORK 


K. Rone 

IBM 

13 of 28 


VALIDATION OF THE PROCESS 


This methodology can be validated only by using Lae process and comparing 
the result to actual data. Since the maintenance phase has not yet 
occurred, a comparison of the method to an independently derived 
projection is an alternate approach. Figure 7 represents the use of the 
methodology on the Onboard Shuttle Software project. The figure presents 
the Rayleigh curve representing Release 19 of the flight software. 

Actual data from the project was compared with the curve as shown from 
1/78 through 9/79. The results compared within IX of real costs. The 
projected costs beyond 9/79 compared within 5% of projected costs derived 
by a bottom up estimate. The data from 1/77 to 1/78 were not comparable 
due to previous project costs embedded in the actual costs and functional 
design costs not included in the Rayleigh curve. 


K. Rone 

IBM 

14 of 28 





K. Rone 

IBM 

15 of 28 



SUMMARY 


page rs 

n*»1f |yy 

The Maintenance Estimation Methodology Is a method of projecting 
maintenance needs and required staffing levels. The methodology Is 
summarized in the following steps: 

SOFTWARE DEVELOPMENT AND MAINTENANCE PROJECTION 

1. Use previous projection or actual data and assume that the work 
stops after last designated release. 

2. Use Rayleigh curve method to project maintenance needa after the 
release. 

3. Use matrix method to determine support line needed In a ateady state 
period. 

4. Compute the area between the two curves by Integration. 

5. Estimate the new work to be performed by transforming direct work 
estimates into project estimates. 

6. Determine if new work fits under the support line. If not, either 
adjust schedules or phasing to reach support line. 

Add new work scope and recompute Rayleigh curve to compare phasing 

and for basis of next projection. K. Rone 

IBM 

16 of 28 


ORIGINAL 
OR POOR 


7 . 


REFERENCES 


1. Norden, Peter V., "Useful Tools for Project Management," Management 
of Pr oduction, M. K. Starr (Editor), Penguin Books, Inc., Baltimore, 
MD, 1970,pp. 71-101. 


K. Rone 

IBM 

17 of 28 



THE VIE’ /GRAPH MATERIALS 
for the 

K. RONE PRESENTATION FOLLOW 


K. Rone 

IBM 

18 of 28 



K. Rone 

IBM 

19 of 28 



Federal Systems Division 

1322 Space Park Drive, Houston 77058 


MAINTENANCE ESTIMATION METHODOLOGY 
PRESENTATION 



K.Y. RONE 

AUGUST 7, 1980 


ORIGINAL PAGE IS 
OF POCR QUALITY 



K. Rone 

IBM 

20 of 28 



'V.GIMAL PAGE IS 





SPACE SHUTTLE programs 


Tio* MAINTENANCE ESTIMATION METHODOLOGY 


1 1>«« _/ J/ (tO 


2 9 


IBM 


WHAT IS NEEDED? 






K. Rone 

IBM 

22 of 28 



ORIGINAL PAG* tl 
OF POOR QUALITY 












K. Rone 

IBM 

24 o/ 28 



SPACE SHUTTLE PROGRAMS 


i..|» MAINTENANCE ESTIMATION METHODOLOGY 


Dm 1/16/80 


.of_ 


NEED: 


A METHOD OF DETERMINING A REASONABLE LEVEL OF SOFTWARE 
DEVELOPMENT SUPPORT IN A STEADY STATE PERIOD 


SOLUTION: MATRIX METHOD 


USE: 


DETERMINE FUNCTIONAL ELEMENTS OF PROJECT 

QUANTIFY MAINTENANCE NEEDS BASED ON: LEVEL = FUNCTION 

SI ZE/( (PRODUCTIVITY) (COMPLEXITY) (FACTOR)) 

CONSIDER CRITICAL SKILLS , LEVEL 3 TEST, OPERATIONS 
SUPPORT AND MANAGEMENT AND SUPPORT 

SUMMARIZE FOR PROJECT 

PLOT WITH RAYLEIGH CURVE 

RAYLEIGH CURVE REPRESENTS CURRENT EFFORT REQUIRED 
BASED ON PAST EFFORT 

SUPPORT LINE REPRESENTS LI*E TO TEND TOWARD AND 
REACH IN STEADY STATE 


LEVEL 



TIME 


o 

•** 


3 



Sr 

r- 


953 1471 








SPACE SHUTTLE PROGRAMS 

Tut* MAINTENANCE ESTIMATION METHODOLOGY |l-_ -/ 16/60 6 r>f 9 

IBM 

• NEED: A METHOD OF DETERMINING MANPOWER AVAILABLE TO PERFORM 

NEW WORK 

• SOLUTION: CALCULATE AREA BETWEEN CURVES 

• USE: - INTEGRATE DIFFERENCE BETWEEN CURVES 

EVALUATE OVER TIME OF INTEREST 

AREA REPRESENTS EFFORT NOT USED IN MAINTENANCE 
OF PAST WORK WHICH CAN BE APPLIED TO NEW TASKS 







K. Rone 

IBM 

26 of 28 



SPACE SHUTTLE PROGRAMS 


T.tif MAINTENANCE ESTIMATION METHODOLOGY 


Date 1/ 16/80 


P»g»_Z_o* JL 


NEED: 


A METHOD OF CONVERTING CR ESTIMATES TO TOTAL PROJECT COSTS 


SOLUTION: PROJECT COST EQUATIONS 


USE: 


EXAMINE "CR" AND "FIXED 1 COSTS IN RECENT PROPOSALS 

DETERMINE RELATIONSHIP BETWEEN CR AND TOTAL COSTS 

PROJECT COST = 6.25 (CRA CRS + CRV) 

WHERE CRA = APPLICATION CR COSTS 
CRS = SSW CR COSTS 
CRV = VERIFICATION CR COSTS 

PROJECT COSTS REPRESENT THE COSTS WHICH WILL BE 
USED TO "FILL UP" TI'.E AREA BETWEEN THE CURVES 



o o 

■n 30 

uQ 
c z 
o •> 

30 t~ 
© X 

r- i '. 

^ Zi 


953-1471 





K. Rone 


SPACE SHUTTLE PROGRAMS 

TH* MAINTENANCE ESTIMATION METHODOLOGY D» 1/16/80 | P**JLolJL- 

IBM 

• NEED: VALIDATION 

• SOLUTION : COMPARE RESULTS OF THE SCHEME TO PA! T PROJECT DATA AMD 

CURRENT PROJECTIONS 

• COMPARISON:- RESULTS COMPARED WITH RESULTS OF THE EXTENSION 

PROPOSAL 

COMPARES WITHIN 7% OF REAL COSTS 
COMPARES WITHIN 3-5% OF PROJECTED COSTS 
EARLY COSTS NOT COMPARABLE DUE TO: 



953-1471 


Sr-^'AL PAGE IS 

Of POOR QUALITY 





K. Rone 
IBM 

28 of 2i 



original page is 

OF POOR QUALITY 






• * * 




STAFFING IMPLICATIONS OF SOFTS ABE PKODUCTIVUT MODELS 
Robert C. Tausworthe 

Jet Propulsion Laboratory * ' / 

California Institute of Technology 
Pasadena. California 


ABSTRACT 

This paper investigates the attributes of software project 
staffing and productivity implied by equating the effects of two 
popular software models in a snail neighborhood of a given 
effort-duration point. The first nodel. the "communica tions 
overhead'* nodel. presupposes that organizational productivity 
decreases as a function of the project staff size, due to 
interfacing and interconnunication. The second, the so-called 
"software equation." relates the product size to effort and 
duration through a power-law tradeoff fornula. The conclusions 
that nay be reached by assuning that both of these describe 
project behavior, the forner as a global phenomena and the latter 
as a localized effect in a snail neighborhood of a given effort- 
duration point, are that (1) there is a calculable maximun 
effective staff level, which, if exceeded, reduces the project 
production rate* (2) there is a calculable maximum extent to 
which effort and time may be traded effectively, (3) it becomes 
ineffective in a practical sense to expend more than an 
additional 23-50% of resources in order to reduce delivery time, 

(4) the team production efficiency can be computed directly from 
the staff level, the slope of the intercommunication loss 
function, and the ratio of exponents in the software equation, 

(5) the ratio of staff size to maximum effective staff size is 
directly related to the ratio of the exponents in the software 
equation, and therefore to the rate at which effort and duration 
can be traded in the chosen neighborhood, and (6) the project 
intercommunication overhead can be determined from the staff 
level and software equation exponents, and vice versa. Several 
examples are given to illustrate and validate the results. 


*The research reported in thia paper was carried out at the Jet 
Propulsion Laboratory of the California Institute of Technology 
under a contract sponsored by the National Aeronautics and Space 
Administration. 

R. Tausworthe 
JPL 
1 of 34 


ORIGINAL PAG” 15 
OF POOR QUALITY 


STAFFING IMPLICATIONS OF SOFIVAKB PBOOUCTIVITT MODELS 

Robert C. Tausworthe 
Jet Propul »ion Laboratory 
California Inatituta of Technology 
Pasadena, California 


I. INTRODUCTION 

Brooks [1], in Un M ythical M an-Month proposed a staple 
model of software project intercommunication to show that, if 
each task of a large project were required to interfaoe with 
every other task, then the associated intercommunication overhead 
would quickly negate the believed advantage of partitioning a 
large task into subtasks. While not aeant to be an accurate 
portrayal of an actual project, the siodel effectively illustrated 
an increasing inefficiency symptomatic of projects too large to 
be perforated by a single individual. 

Putnam [2], in a 1977 study of software projects undertaken 
by the US Army Computer Systems Command, discovered a statistical 
relationship among product Lines of code, fork effort, and lime 
duration for those projects, whose best-fit formula was a powor- 
law relationship, now referred to as the "software equation," 

L « c k W 0 * 33 T 1 * 33 

(1 have taken the liberty of changing Putnam's notation in order 
to be consistent with my notation in the remainder of the 
article . ) 

One rather startling extrapolation one may make from the 
software equation is that in order to halve the duration of any 
one of the projects studied, it would have taken 16 times the 
resources actually used! I say "extrapolation" because I 
suspect the software equation is more likely to be applicable 
incrementally — that is, if one were to require a 5% shortening of 
the schedule, then a 20% (actually 21.5%) increase in resources 
would be required. 

In this paper, I will generalize both of these models 
parametrically, and suppose that both do describe the statistical 
trends of software projects in small neighborhoods about a chosen 
project situation. Dy equating the model behaviors in these 
neighborhoods, we shall be able to see how the parameters of one 
model relate to the parameters in the other. In addition, we 
shall discover some rather interesting facts about some actual 
projects for which published data exists. 


R. Tausworthe 
JPL 

2 of 34 



ORIGINAL PACS 

of peon quality 

II. A OENEKALIZED INTERCOMMUNICATION OVERHEAD MODEL 

Let us suppose that s software projsot is to develop L kilo- 
Lina* of executable source language instructions, and that this 
nuaber renains fixed over all our considerations of effort, 
duration, staffing, etc. That is. we shall suppose that the 
produot size is invariant over the neighborhood of variability in 
these parameters — a project utilizing greater effort attempting 
to shorten the schedule slightly would produce the sane pro gran 
as a saaller effort requiring soaewhat more tike. 

Let us denote by V the £ork effort (in person-aonths) to be 
expended in the production of the L lines of code, and let the 
line duration, in months, be denoted by T. Then the average 
full-tine equivalent £taff size, S. in persons, is 

S - W / T 

and the overall teaa productivity can be defined as the nuaber 

P ■ L / f (kilo-lines/person-aonth) 

Let us further suppose that the average fraction of line 
that each staff aeaber spends in intercomaunics tion overhead is 
dependent on the staff size alone, within a particular 
organizational structure and technology level, and let this 
fraction be denoted by t(S): 

t(S) - (intercommunication tine) / (hours/ao. worked) 

Generally speaking, one intuitively expects t(S) to Increase 
aonotonically in S due to the expanding nuaber of potential 
interfaces that arise as staff is increased. 

But the ladlYidml productivity of the staff, defined as 

the individual productivity during non-inter comaunica tion 
periods, P 4 , is soaewhat greater than P, being related to it by 

P - P 4 [ 1 - t(S) ] 

The relationship between the number of kilo-lines produced, 
the effort, and the staffing is 

L - P 4 W [ 1 - t (S) ] 

Let us denote by V Q and Tq the effort and time, 
respectively, that would be required by a single unencumbered 
individual to perform the entire software task (assuming also 
that it could be done entirely by this individual, no aatter how 
long it took). Then, with respect to the actual W and T, there 
is the relationship 


R. Tausworthe 
JPL 

3 of 34 



ORIGINAL PAGE IS 
OF POOR QUALITY 


W 0 - L / P ft - V I 1 - t(8) ] - T 0 

Thi • Wq represents the least effort that Bust be expended, and Ta 
is the aaxiaoB tiae that will be required. By substituting f/T 
for S, one obtains an effort-tine tradeoff relationship 

M ■ 1 / [ 1 “ t(w/t) ] 

where w - W/Wq and t * T/Tq are "normalised” effort and 
duration, respectively. 

The rate at which an increase in staffing reanlta in an 
increase in noraalixed work effort is then 

du 

— - «» t'(S) > 0 

as 

where t’O refers to the derivative of t with reapect to 8. 
Because of the monotone character of t(S), an increase in staff 
leads to an increase in effort. 

The overall staff production Bate, R, is the nuaber of kilo- 
lines of code ner month produced by the entire teaa of S persons, 

R - P i S [ 1 - t(S) ) 

The factor 

r\ - I 1 - t(S) 1 

is then the team production efficiency . Note that the noraalixed 
task effort is the inverse of the production efficiency, 

id ■ 1 / q 

The maximum rate of software production will occur when the 
derivative of R with respect to S becoaes xero, a condition 
requiring a value Sq that will satisfy the relationship 

t'(S 0 ) « I 1 - t(S 0 ) 1 / s 0 

We shall refer to this staffing level as the a axl aum effective 
staff . Two particular examples of t(S) will serve to illustrate 
the characteristics of the intercommunication overhead model. 


R. Tausworthe 
JPL 

4 of 34 



OR'O'rjAL PAGE rs 
OF POOR QUALITY 


Linear Intercommunication Overhead. Let ua assume first, as 
did Brooks, that the overhead is linear in staff. 

t(S) - t 0 ( S-l) 

that is. there is no overhead for 1 person working alone, but 
when there are S-l other people, then each requires an average 
fraction tg of that individual's time. Under these assumptions, 
the maximum effective staff level is 

S Q « < 1 + t Q > / < 2 t Q > 

This value yields a maximum team production rate of 

*«.. - Pi sa / < 2 s 0 - t > 

and team efficiency 

t|g * ( 1 + tg ) / 2 •* Sq / ( 2 Sq — 1 ) J OiS 

This perhaps alarming result states that a team producing at its 
maximum rate is burning up half its effort in intercommunication 
overheadl The behavior is illustrated in Figure 1. 

The normalized effort-duration tradeoff equation for this 
model takes the form 

t 0 

x * 

(l + tQ>C)-l 

which has its minimum value at the maximum-product ion-rate point. 

T min “ 4 *0 ' ( 1 + *0 ,a ~ 4 *0 
at which point the normalized effort is 

*> 0 - 2 / ( 1 + t 0 ) <2 

Figure 2 shows the characteristic of this tradeoff law at Iq 
values of 0.1 and 0.2. for illustrative purposes. 

According to this model, it qe ver pays to expend more than 
twice the single-individual effort. Moreover, even though the <d 
producing the shortest schedule is less than 2. the effective 
range is much less than this, as shown in the figure. Effort can 
be traded for schedule time realistically only up to about 1.25 
Wq, and a factor of two saving in time can only come about if the 
individual intercommunication can be kept below about 15% per 
interface . 


R. Tausworthe 
JPL 

5 of 34 



Normalized Tear rate, R/P 


ORIGINAL PAGE IS 
OF POOR QUALITY 



R I’stu* wo! thr 

JIM 

0 of M 



Duration Ratio. T/Tq 


or/qhw ra 

OF POOR QUALITY 



R. Tausworthe 
JPL 
7 of 34 



Duration Ratio, T/Tq 


ORIGINAL PAGE IS 
OF POOR QUALITY 



Work Effort Ratio, W/Wg 


R. Tausworthe 
JPL 

8 of 34 


i 





ORIGINAL PAGE IS 
OF POOR QUALITY 

Exponentially Dieiyiai Intercoaaunication Oftrhud. On* 
unsettling aspect of the linear intercommunication overhead nodel 
it that, at soae staffing level, the production rate goea to 
zero, and beyond, unrealistically into negative values. Perhaps 
a more realistic nodel is one which assuaes that t(8) tapers off, 
never exceeding unity, at a rate proportional to the reaaining 
fraction of tine available for intercoaaunication as staff 
increases, or 


ft,:) « tj I 1 - t(S) ] 

Then we are led to the fora 

t(S) - 1 - exp [ -t^S - 1) 1 
The aaximua effective staff in this case becoaes 

S 0 ■ 1 / »1 

and the maximum production rate is 

R max " p i s -1 ♦ 1/8 ] - P t 8 / e 
The team efficiency at this rate is 
Hq * expC -1 + 1/S ] 2 1/e 

Now this is perhaps even more alarming a revelation than before, 
because it says that when producing software at the aaximua team 
rate, that team is burning up 63% of its tiae in 
intercommunication! The consolation, as shown in Figure 1, is 
that the t^am performance under this assumed nodel is superior to 
that of the linear-tiae team model. More staff can be applied 
before the maximum effective staff level is reached. 

The effort-duration tradeoff equation according to this 
model is 


t * t| w / t tj + ln(w) ] 

The minimum x occurs at 

Wq ■ exp( 1 - tj ) < e 
and the minimum value is 

T min “ *1 ex P ( 1 - h > 8 • tj 

The form of this tradeoff is shown in Figure 3 for t^ values of 
0.1 and 0.2, for illustrative purposes. Note that the minimum t 
is much broader in this model, so that, although the actual 
minimum occurs when u is about e in value, the realistic 

R. Tausworthe 
JPL 

9 of 34 



ORIGINAL PAGE IS 
OF POOR QUALITY 


effective range for w is less than sbont 1.5. That is* it is not 
cost-effective to expend sore than sbont 1.5 tines the single- 
individnsl effort Wq in an attempt to redoes the schedule tine. 
A reduction in schedule by a factor of two is possible only vhen 
the individual intercommunication factor t« can be kept below 
0 . 2 . 

Conclusions from Intercommunication Overhead Models. Both 
of the examples of intercommunication overhead above bespeak a 
maximum effective staffing level at which the project is 37-50% 
efficient. Beyond this point, further staffing la counter- 
productive. Both examples conclude that the naxiaua practical 
extent to which added effort is effective in buying schedule tine 
is limited to about 25-50%. Significant schedule reduction 
factors are possible only when the intercommunication factors can 
be kept below 15-20%. 


III. MATCHING HIE SOFTWARE EQUATION MODEL 

Let us generalize the Putnam Software Equation as the form 

L - c k # p T q 

and let us define r * q/p. the exponent ratio. As in the previous 
section. L is held constant with respect to effort-duration 
tradeoff considerations. The value of p is assuredly positive: 
it generally requires more work at a given T to increase L. If q 
is positive, effort can be traded to decrease the schedule time 
required to deliver a given L. The larger r is. the larger the 
increase in effort required to shorten the schedule, and the 
larger the team production inefficiency. If q la xero. then L la 
a function of W alone. T is determined solely by the staffing 
level. T-f/S, and no additional effort is required to reduce 
schedule time (in the neighborhood in which the p and q«0 arc 
valid). If q w re ever to be negative, then an increase in W 
would render an increase in T. a situation indicating overmanned 
projects. 

Substitution of T ■ W/S. differentiation with respect to S. 
and normalization of the software equation produces the result 

dm 

— ■mr/[S(l + r)]»Tr/ ( 1 r ) 

as 

Let us now suppose that both the software equation and the 
intercommunications overhead model agree at the point (L, f, T). 

The two models can be equated by suitable choices of the 
"technology constant." c^, and individual productivity. P|. 

Then, in addition, let ua suppose that the derivatives of effort 
with respect to staff level for both models also agree at this 

R. Tausworthe 

JPL 

10 of 34 



ORIGINAL PAGE IS 
OF POOR QUALITY 


point. Such can only ba attempted when r > 0, became the 
derivative in the intercommunication overhead model ia always 
positive. When this is the case* the two models may be said to 
agree in the* neighborhood of the point (L, V, T). 

Thus, by equating the derivatives, we arrive at a 
relationship between the parameters of the two models: 

S t'(S) r 


[ 1 - t(S) ] 1 ♦ r 


or 


r\ - S t'(S) < r + 1 > / r 

Let us now examine this relationship for the two examples of the 
interface overhead model: 


Linear Intercommunication Overhead. Subst itut ion of the 
linear t(S) form into the neighborhood agreement condition yields 


S 


1 + 2r J L 2t 0 J 


S Q r / ( r + 0.5 ) J 


This equation states that the staffing level is related to the 
maximum effective staff point through the software exponent 
ratio, r. At the Putnam value, r ■ 4, the staffing level is 89% 
of the maximum effective level, and the team efficiency is 


n - 0.55 ( 1 + t 0 ) ~ 55-65% 

u> - 1 .8 / ( 1 + t Q ) z J -5“1 -8 

As seen in Figure 2. projects having this high an co are at the 
point that extra effort is very ineffective. 


Exponentially Decaying Intercommunication Overhead. By 
substituting the exponential form for t(S) into the neighborhood 
agreement condition, we find 


S 


r/(tj(l + r)]-S 0 r/(l + r) 


Again, we see that the staffing level is related to the maximum 
effective staff via the exponent ratio. The Putnam value r * 4 
produces 


R. T;nts wo! the 
JPl 

I t of 34 



ORIGINAL PAGE IS 
OF POOR QUALiTV 


S ■ 0.8 Sq 

n * expl -(S-1 )/Sq I ■ exp[ -0.8 + tj ] z 45% - 55% 

w ■ 1/q ■ expl 0.8 - tj 1 z 1*8 ~ 2.2 

Although this example indicates a soaewbat aora coafortabla 
aargin below aeziaua effective staffing than did the linear 
model. it nevertheless shows an alarmingly low cost inefficiency. 


IV. BXAMPLES USING AVAILABLE DATA 

Several data sets of project resource statistics published 
in the literature readily show that Putnam's value of r-4 is not 
universal. Specifically* Freburger and Basili [3] publish data 
which yield the following 3-paraaeter best power-law fits: 

L 0 - 1.24 I 0 * 95 r 0 - 094 ( r - -0.1 ) 

Lj - 0.22 W°- 78 l 0 * 78 ( r - 1.0 ) 

in which Lq is kilo-lines of delivered code, and L^ is developed 
delivered code. It is interesting here to note that the foraer 
relationship is nesrly independent of T. whereas the latter shows 
s definite beneficial V-T tradeoff characteristic. The negative 
q in the foraer relationship indicates that* on a delivered code 
basis , added resources in one of the projects would have extended 
the schedule! An equivalence between the software equation and 
the intercommunication overhead aodel cannot be established when 
r is zero or negative. 

This data set is not the only one to show a negative q: 
Boehm [4], in his Sof twsre Economics book, has a data base used 
to calibrate his COCOMO software cost aodel. A 3-paraaeter best 
power-law fit to the adjusted data produces the relationship 

L - 0.942 W°*« 75 r 0 - 028 ( r - -O.Jl ) 

Agsin. the tradeoff equation indicates that the projects in that 
data base were perhaps overmanned. 

i 

Gaffney [5], on the other hand, did a 3-paraaeter best 
power-law fit of IBM data (Federal Systems Division. Manassas) to 
arrive at the relationship 

L - c k I 0 - 63 T® * 56 ( r - 0.88 ) 

This last value of r aligns aore closely with the Freburger- 
Basili value for developed delivered code. 


R. Tausworthe 
JPL 

12 of 34 


i 



R. Tausworthe 
JPL 

13 of 34 


Exponent Ratio, r = q/p 


Team Production Efficiency, N 


ORIGINAL PAGE IS 
OF POOR QUALITY 



R. Tausworthe 
JPL 

14 of 34 


ORIGINAL PAGE 13 
OF POOR QUALITY 

V. CONCLUSION 

This srticls has shown Ihst vhsn there is • positive 
eff ort-dorstion tradeoff relationahip in a software project, it 
is possible to estimate the tea* production efficiency and 
proxiaity to aaxiaua effective staffing. These figures can be 
used to advantage by aoftware Managers who auat judge the 
effectiveness of increasing resources in order to shorten 
schedules. It points out the necessity of keeping accurate 
records of software project statistics, so that the paraaeters in 
the aodel can be estiaated accurately. 

Low values of r in an organisation are a aark to be proud 
of, showing efficiency in teras of structuring subtasks for clean 
interfaces. High (or negative) values of r aay be indicative of 
overall task coaplexity, volatility of requireaent a, 
organisational inefficiency, or any nuaber of other traits that 
tend to hinder progress. The value of r aay thus be treated as a 
figure of aerit — a Measurable statistic indicative of the 
efficiency of a set of projects in perforaance of assigned tasks. 

The rstio S/Sq is another indicator for aanageaent. When 
low, it indicates that adding reaources can potentially help a 
project in trouble. If closer to unity, it is a warning that 
adding resources aay not help, will not appreciably ahorten the 
schedule, will incur expense at a low return in productivity, 
and, if applied often in other projects, will thereby contribute 
to an organisational reputation for expensive software. 


R. Tausworthe 
JPL 

15 of 34 



REFERENCES 


1. Brook., F. P., Ihi Uxlkijul H>arH<?nU> Addiaon-fealey Pub. 

Co., Reading, HA, 1975. 

2. Puts. a, L. B., "Progreaa in nodeling th« aoftware Ilf. 
cycle in • phenoaenologieal way to obt.in engineering 
quality eatiaatea and dynaaic control of the proceae," 
Second Software Ult fltfilt IfiMIUtAi Workaho* . a pone ore d by 
US Army Coaputer Syat.aa Coaaand and IEEE Coaputer Society, 
Atlanta, 6A, Aug. 197S. 

3. Freburger, K., and Belli, V. K., ‘‘The Software Engineering 
Laboratory, Relationahip Equation.," Report TR-764, 
Cniveraity of Maryland Coaputer Scienoe Center, College 
Park, MD. May, 1979. 

4. Boeha, B. W., Software Econoaloa . Prentice-Ball Publiabing 
Co., Englewood Cliff., NJ, 1982. 

5. Gaffney, J. E., "An Approach to Software Coat and Schedule 
Eatiaation." aubaitted to Journal stl Pflmil 2ullll 
Acaulaltlon Manaaonent. (pending). 


R, Tausworthe 
JPL 

16 of 34 ' 


THE VIEWGRAPH MATERIALS 
for the 

R. TAUSWGRTHE PRESENTATION FOLLOW 


R. Tausworthe 
JPL 

17 of 34 



R. Tausworthe 


STAFFING IMPLICATIONS 
OF SOFTWARE PRODUCTIVITY MODELS 



Robert C. Tausworthe 


RCT-l 

12 - 1-82 



R. Tausworthe 
JPL 

19 of 34 


• INTERCOMMUNICATIONS OVERHEAD MODELS 

• PUTNAM SOFTWARE EQUATION 

• COMBINED EFFECTS 

• CONCLUSIONS 


RCT-2 
12-1 -82 



R. Tausworthe 


NOMENCLATURE 


L = LINES OF DELIVERED SOURCE CODE (THOUSANDS) 

W = WORK EFFORT (PERSON-MONTHS) 

S = AVERAGE FULL-TIME EQUIVALENT STAFF (PERSONS) 

P = PRODUCTIVITY (KILO-LINES OF CODE/PERSON-MONTH) 
R = TEAM PRODUCTION RATE (KILO-LINES/MONTH) 


rc:-3 

12-1-82 



R. Tausworthe 
JPL 

21 of 34 


INTERCOMMUNICATION OVERHEAD MODEL 


t(S) ■ ( INTERCOMMUNICATION TIME)/(hrs/mo. WORKED) 

P - P.[l-t(S)] 

P. - INDIVIDUAL PRODUCTIVITY DURING NON-INTERCOMMUNICATIONS 

L * P. W [1 - t(S)] 

R - P.S tl - t(S>] 

t(S) - 0 FOR S < 1 

t(S) INCREASES MONOTONICALLY FOR S > 1 

RCT-4 

12 - 1-82 



R. Tausworthe 


EFFORT - DURATION TRADEOFF 

INTERCOMMUNICATION OVERHEAD MODEL 


W 1 



WHERE THE SINGLE-INDIVIDUAL-TASK W n , T n VALUES ARE 

0 U 

W = T = L/P 
0 0 i 

W Q IS LEAST EFFORT REQUIRED 
T Q IS LONGEST TIME REQUIRED 


RCT-5 

12-1-82 


ORIGINAL PA Of t> 
OF POOR QUALITY 



. Tausworthe 


• LINEAR INTERCOMMUNICATION OVERHEAD 


MS* 


t(S) = t Q (S-l) FOR S>1 


T min 4t 0 4^ AT TTr- = At < 2 


T 2 

0 (1 + v 


w o 1 + , 0 


EXPONENTIAL DELAY INTERCOMMUNICATIONS OVERHEAD 


t<S) - 1-exp [-(S-l) t x 3 


T . 
mm 


W 


= t x exp (1 - AT - exp [ 1 - tj] < e 


0 


0 


3 ss 


S n = STAFF SIZE AT T . IS THE "MAXIMUM EFFECTIVE STAFF" 
0 mm 


RCT-6 

12 - 1-82 


ORIGINAL PAGE 
OF POOR QUALI 



DURATION RATIO, T/T. 


ORIGINAL PAGE 18 
OF POOR QUALITY 


TIME - EFFORT TRADEOFF 

LINEAR OVERHEAD 



RCT-7 

12-1-82 


R. Tausworthe 
JPL 

24 of 34 



DURATION, T/I 


* > ? »v g»^.*iTY 


TIME - EFFORT TRADEOFF 

EXPONENTIAL OVERHEAD 



RCT-8 
12-1 -82 


R. Tausworthc 
JPL 

25 of 34 



ORIGINAL PAGE JST 
OF POOR QUALITY 


PRODUCTION RATE 



RCT-9 
12-1 -82 


R. Tausworthe 
JPL 

26 of 34 




R. Tausworthe 


SOFTWARE EQUATION 


GENERAL FORM 

L » c.W p T q 
k 

• DENOTE r = q/p 

• PUTNAM'S ORIGINAL EVALUATION 

0 33 1 33 
L = c. vr-"r 
k 

• DEFINES TIME-EFFORT TRADEOFF 

• PUTNAM'S VALUE OF r = 4 


RCT-IO 

12 - 1-82 


i 3DVtl WiNiicisdO 



R. Tausworthe 
JPL 

28 of 34 


NEIGHBORHOOD EQUIVALENCING 


• ASSUME OVERHEAD MODELS DESCRIBE GLOBAL EFFECTS OF STAFF SIZE 
ON PRODUCTIVITY FOR GIVEN L 


• ASSUME SOFTWARE EQUATION EXPLAINS LOCALIZED BEHAVIOR IN 
NEIGHBORHOOD OF A PARTICULAR (W, T) POINT FOR GIVEN L 


• MAKE BOTH MODELS AGREE AT (W, T) AND HAVE SAME SLOPE AT THIS 

POINT, FOR GIVEN L, BY PROPER CHOICE OF TECHNOLOGY CONSTANT, 

c,, AND INDIVIDUAL PRODUCTIVITY, P. 
k' i 

• NEIGHBORHOOD EQUIVALENCE CRITERION 

St*(S) r 
1 - t(S) ' 1 + r 


RCT-ll 

12 - 1-82 



DURATION RATIO, T/T, 


cr:ginal page is 

OF POOR QUALITY 


LOCAL BEHAVIOR, LINEAR OVERHEAD 



RCT-12 

12 - 1-82 


R. Tausworthe 

JPL 

29 of 34 



ORIGINAL PAGE IS 
UF POOR QUALITY 


LOCAL BEHAVIOR, EXPONENTIAL OVERHEAD 



RCT-13 

12 - 1-82 


R. Tausworthe 
JPL 

30 of 34 



R. Tausworthe 



RCT-14 

12-1-82 



TEAM PRODUCTION EFFICIENCY, 17 






R. Tausworthe 


EXPONENT RATIO DETERMINATIONS 


• PUTNAM'S ORIGINAL VALUE, r=4 

• FREBURGER-BASILI (U. OF MD) 

r = 1.0 (DEVELOPED, DELIVERED CODE) 
r = -0.1 (DELIVERED CODE) 

• GAFFNEY ( I BM-MANASSAS) 

r = 0.88 

• BOEHM (TRW) 

r * -0.041 (ADJUSTED DATA) 
r =0.086 (RAW DATA) 


RCT-16 

12 - 1-82 



R. Tausworthe 


CONCLUSIONS 


• TIME AND EFFORT CAN BE TRADED ONLY SO FAR 


• THE EXPONENTS OF THE SOFTWARE EQUATION ARE RELATED TO THE S/S 0 
RATIO, AND THEREFORE ARE INDICATORS OF HOW NEAR A PROJECT 
IS TO BEING OVERSTAFFED 


• WHEN S/S 0 IS NEAR UNITY, ADDITIONAL STAFFING WILL NOT HELP A 
PROJECT 


• IT IS NEVER EFFECTIVE TO APPLY MORE THAN TWICE THE SINGLE- 
INDIVIDUAL-EFFORT TO SHORTEN SCHEDULE TIME 


• THERE IS A NEED FOR MORE STATISTICAL STUDY OF r AS A FUNCTION 
OF OTHER PROJECT CHARACTERISTICS 


RCT-17 

12 - 1-82 



original PAGE 18 

OF POOR QUAUTI 


% 


Estimates of Software Size 
From State Machine Designs 


w N83 32368 


Robert N. Britcher 

IBM, Federal Systems Division 

Gaithersburg, Md. 


John E. Gaffney* 
National Weather Service 
Silver Spring, Md. 


* On leave from IBM Corporation, Federal Systems Division 



J. Gaffney 
IBM 
1 of 26 


There le a greatly evident need for Improving the estimates of the amount of 
function to be provided by a software system. State Machine models ( 1 , 2 ) are 
being employed to record software designs as they evolve. So, It appears 
natural to attempt to derive estimates of the amount of code that will 
ultimately result from these designs by using quantities directly available 
from them as they are created. This paper demonstrates that the length, or 
size (in number of Source Lines of Code) of programs represented as state 
machines can be reliably estimated in terms of the number of Internal state 
machine variables . Variables, here, are defined as the unique data required 
by a state machine's transition function, not the data retained in the state 
machine's memory. They are equivalent to Halstead's ( 3 ) operands. Data 
collected from the SACDTN project ( 4 ) was used to develop software size 
estimating formulas for a software system from which the state machine 
representation Is available at various levels of abstraction. Hence, the 
methodology presented should be employable at successive stages of the 
development process to provide estimates (with, hopefully) Increasing accuracy. 

An important aspect of developing softwire Is the derivation of estimates of 
the amount of function (typically presented as a SLOC count) the system Is to 
provide. This paper presents code size estimation formulas that can be 
successively applied as the design for a software system evolves. The 
estimation of software size and development cost (assuming certain rates) in 
terms of man months per thousand lines of code (see reference 5 ) can be made 
relatively early in design and refined as the design effort proceeds. The 
code size estimation formulas can be applied to a state machine 
conceptualization of a software system at the highest level and Individual 
procedures at the lowest. 

A program can be regarded, and hence estimated, evaluated, and/or compared 
with another program In a number of different ways. Here, we are concerned 
with two principal ways, the linguistic and the structural. From the 
linguistic point of view, a program can be regarded as a string of tokens or 
symbols. Halstead ( 3 ), who did pioneering work using the linguistic approach, 
demonstrated a fundamental relationship between the size of the operand and 
operator vocabulary and the length of the program text, stated in terms of the 
number of tokens or symbols constituting it. This relationship is: 

N * Tj.log2r?i +tj i 2 1o 82^2» w ^ ere N * number 
of tokens, 77 - operator vocabulary size, and V2 m 

operand vocabulary size. 

In assembly code, the "operator" corresponds to the op. code symbol, and the 
"operand" corresponds to the "address" or operand field of the fnstruction. 
Also, "I", the number of instructions is proportional to "NT, t t s number of 
tokens; or I - aN. In fact, I « b.n2log2n2, approximately, -t* -jhown by 
Gaffney for the case of AN*UY K **7 assembly code ( 9 ). Christensen et al. have 

also observed that "program size is determined by the data that must be 
processed by the program ( 10 )". We assert that the "variable count”, obtained 
from the state machine design, at the "procedure level" (as described more 
fully below) corresponds to "02", the operand vocabulary size in Halstead's 


J. Gaffney 
IBM 
2 of 26 


formulas. It Is of Interest to note that relationships similar to those 
developed by Halstead and others for software, part of the material that may 
be termed "software linguistics", have been noted between text length and 
vocabulary size in natural languages by Herdan (6). 

From the structural point of view, a program can be considered principally in 
terms of data flow or in terms of function. In the former, the amount of 
function, stated in terms of the number of lines of code, is related to the 
data flow into and out of each module (see Kafura and Henry (7)) or into and 
out of a program as a whole (see Albrecht (8)). In the function approach, the 
number cf unique inputs and outputs for a procedure, a module, or a program as 
a whole is implied by the size of the function in that software element. 
Whereas, here, we assert the equivalence between the Halstead approach and the 
function approach, by relating the number of variables in a state machine 
procedure to the number of source lines of code: the variables are equivalent 
to the operands in Halstead's formulas. 

A program, or a subdivision of one, such as a module, can be represented as a 
"state machine", as depicted in Figure 1. The "State Machine" consists of two 
principal parts, the "transition function" and the "state data". The former 
gives rise to the actual code. The latter is the "memory" of the program. 

The transition function, call it "T" is a function whose elements are ordered 
pairs of ordered pairs (2), to wit: 

T ■ [ (present state, input), (new state, output) ] . 

Thus, "T" really symbolizes the combinational logic of the program, not 
different in principle from a program without memory. The state machine 
characterization of a program is an adaptation of the "Mealy-Moore" model of 
sequential machines originally developed to represent automation in general and 
telephone switching circuitry in particular (11). 

As described by Britcher and Moore (4), the SACDIN Dialog Manager was designed 
using the state machine model. Some 8000 lines of code (S/370 assembly plus 
some macros, including comments), were written, based on a state machine 
decomposition consisting of 20 machines, comprising 74 transitions, or 
procedures. We derived several formulas (by regression). One of them was: 

S «■ 8.825 x V'og e V, where S ■ estimated number of SLOC, 

including comments (about 40%). 


(The statistics of the fit, to the data from which it was derived) is given in 
the table below: 



Relative Error (1) 


Size Estimating 

Avg . by 

•- S) 

S 

Std. Deviation 


Formula 

Procedure 

by Procedure 

Avg, Overall 

S - 8.825 x Vlog e V 

.027 

.564 

-.0097 

S - 21.3282 xV 

.222 

.518 

.0845 


J. Gaffney 
IBM 
3 of 26 



ORIGINAL PAGE M 
OF POOR QUALITY 

figure 1 

State Machine Representation of a Program 



T = [(p. state, input); (n. state, output)! 


J. Gaffney 
IBM 
4 of 26 




Note: (1) S - estimated SLOC's (w/comments) ; S ■ actual SLOC's (w/comments) 


The variable V is the "variable count" obtained from the state machine 
design. It corresponds to 77 2 » the number "operands" in Halstead's formulas. 

The software code size formula, S ■ 8.825xVlog e V, was verified using the 
data from another major SACDIN software component, "Crypto". The relative 
error, indicative of the degree of fit of the estimating formula to the Crypto 
data, is tabulated below, and compared with the corresponding figures 
representing the degree of fit to the Dialog Manager. 


Relative Error 

Dialog Manager 

Crypto 

Overall 

-.0096 

-.0474 

Average by Procedure 

.027 

-.1056 

Standard Deviation by 

.564 

.8917 


The relatively good fit of the size estimating formula derived from the Dialog 
Manager program and applied to the Crypto program supports our contention that 
the formula is a general one, applicable provided that proper design 
decomposition rules are followed. 

The data suggests that there are relationships between the counts of variables 
in state machine representations of software designs and the amount of code 
produced from the design. These relationships can be used to estimate code 
size based on designs implemented using the state machine technology. The 
data also suggests a connection between the state machine and Halstead 
software models, 

Vhe formula for the number of SLOC, given above, can be converted to one 
representing the number of assembly language SLOC, without comments. The 
expansion ratio of the language in which the SACDIN programs were written is 
about 1.2, and these programs had about 40% comments. Therefore, S, assembly, 
without comments is: 

S - 8.825 x 1.2 x .6 x Vlog e V - 6.354 Vlog e V 

Any software system should be decomposable into 6 "levels”, ranging from level 
0, the initial program specification, through level 5, the code. The levels 
are depicted in Figure 2. The formulas presented above were derived for 
application at level 4, the procedure level. From this point of view of 
levels, the design and code are essentially more detailed statements of the 
requirements (the later ones addressed 0 the machine, while the earlier or 
higher levels are addressed to people). 

Since any software system should have the same number of decomposition or 
specification levels, a system having more code should have proportionally 
more“boxes”at each level. Hence, one should be able to produce an estimate 
based on the number of boxes at a certain level, recognizing that, on the 
average, about the same amount of function (and hence code count, for a 
language at a certain level, e.g., assembly) should be resident in a “box” at a 
given level in the specification hierarchy. A similar notion is used by dome 

J. Gaffney 

IBM 

5 of 26 



FIGURE 2 


n 






OF POOR 


quality 


Levels of Specification 



IBM 

6 of 26 













hardware estimators. Based on experience, a hardware estimator might 
estimate, for example, that a certain amount of function might require "about 
1/2 type x box", where he Is familiar with a "type X" box which is an element 
of an exlstant system. 

Based on the SACDIN data, we note that each level 4 procedure machine has an 
average of 6 variables, and hence has an average of 68 SLOC (assembly). Also, 
there Is an average of 4 level 4 machines per level 3 machine. Hence, there 
Is an average of 273 SLOC per level 3 machine. Finally, there Is an average 
of 20 level 3 machines per level 2 machine, suggesting an average of 5460 SLOC 
(assembly) per level 2 machine. 


Acknowledgement 

The authors express their thanks for the support provided by Mr. Don Zarefoss 
of IBM, FSD, Gaithersburg, Maryland during the course of the developments 
described here. 


J. Gaffney 
IBM 

7 of 26 



REFERENCES 


1. Linger, R. C., Mills, H. D., and Witt, B. I., "Structured Programming 

Theory and Practice," Addlson-Wesley , 1979, pg, 32. 

2. Ferrantlno, A. B., and Mills, H. D., "State Machines and Their Semantics 

in Software Engineering, "IEEE COMSAC, Chicago, Fall, 1977. 

3. Halstead, M. H., "Elements of Software Science", Elsevier, 1977. 

4. Britcher, R. N. , and Moore, A. R, , "Increased Productivity Through the Use 

of Software Engineering in an Industrial Environment", "IEEE Computer 
Society Fifth International Computer Software and Applications 
Conference"; November, 1981, IEEE Catalog No. 81CH1698-0; pg. : 

5. Crulckshank, R. D. , and Lesser, M. , "An Approach to Estimating and 

Controlling Software Development Costs", in "The Economics of 
Information Processing", Vol. 2; pg. 139; Springer-Verlag, 1982. 

6. Herdan, G. , "The Theory of Language as Choice and Change", 

Springer-Verlag; 1966, pg. 86 and other pages. 

7. Henry, S., and Kafura, D. H., "Software Structure Metrics Based on 

Information Flow", IEEE Transactions on Software Engineering Volume 
SE-7; Number 5, September, 1981, pg. 510. 

8. Albrecht, A. J., "Measuring Application Development Productivity", 

Proceedings IBM Applications Development Symposium, Monterey, 
California; October 14-17, 1979; GUIDE International and SHARE, Inc., 
IBM Corporation, pg. 83. 

9. Gaffney, J.E., "Software Metrics: A key to Improved Software Development 

Management"; presented March, 1981, Pittsburgh, at the conference, 
"Computer Science and Statistics; 13th Symposium on the Interface"; 
also proceedings published by Springer-Verlag, 1981. 

10. Christensen, K., Fltsos, G, P. , and Smith, C.P., "A Perspective on 

Software Science, "IBM Systems Journal; Vol. 20, No. 4, 1981, 
pg. 372-387. 

11. Savage, J. E., "The Complexity of Computing"; Wiley, 1976, No. 11. 


J. Gaffney 
IBM 
8 of 26 



THE VIEWGRAPH MATERIALS 
for the 

R. BRITCHER/J. GAFFNEY PRESENTATION FOLLOW 


J. Gaffney 

IBM 

9 of 26 



ESTIM ATES OF SO FTWARE SIZE 
FROM 

STATE MACHINE DESIGNS 


R, N, Britcher 

IBM, Federal Systems Division, 
Gaithersburg, Md, 


J. E, Gaffney, Jr,* 
National Weather Service 
Silver Spring, Md, 


Presentation at 


SEVENTH ANNUAL SOFTWARE ENGINEERING WORKSHOP 
NASA, GODDARD SPACE FLIGHT CENTER 
DECEMBER 1, 1982 


* On leave from IBM, Federal Systems Division 


J. Gaffney 

IBM 

10 of 26 



SOFTWARE DEVELOPMENT WORK EFFORT ESTIMATION 
THE STATE MACHINE MODEL 
SOFTWARE SCIENCE/LINGUISTICS BACKGROUND 
STATE MACHINE/SOFTWARE LINGUISTICS EQUIVALENCE 


J. Gaffney 

IBM 

11 of 26 



MOTIVATION 


ESTIMATION OF AMOUNT OF FUNCTION PROBABLY MORE 
DIFFICULT THAN ESTIMATION OF WORK RATES. 

MORE HAS BEEN DONE ON ESTIMATING WORK RATES THAN 
SOFTWARE SIZE. 

NEED TO QUANTIFY REQUIREMENTS I i TERMS OF LIKELY 
AMOUNT OF CODE IMPLIED BY THEM. 

SUCCESSIVE REFINEMENT FROM REQUIREMENTS TO CODE 

9 

SHOULD BE MATCHED BY ESTIMATION PROCESS. 



C 




J. Gaffney 
IBM 

12 of 26 



SOFTWARE DEVELOPMENT 
WORK EFFORT 
ESTIMATION METHODOLOGY 


WORK HOURS = WORK RATE # AMOUNT OF SOFTWARE FUNCTION 

SOME MEASURES OF SOFTWARE FUNCTION 

• SOURCE LINES OF CODE 

• OPERANDS 

• STATE MACHINE VARIABLES 


J. Gaffney 

IBM 

1 3 of 26 



wi'Rk mom isiimaiicn I’Rommiii 

I SI IMA 1 1 AMOtlNl 0! SOIO/AK! "I UNCI I ON" 
isiiMAii work mom 


J Ci.itlun 
IBM 

14 of .'(( 



SOFTWARE FUNCTION MEASURES 

LINGUISTIC: REPRESENTS A PROGRAM AS A SEQUENCE OF SYMBOLS, 
EQUIVALENT TO DISCOURSE 

• SOFTWARE SCIENCE 

• OPERANDS 

STATE MACHINE: REPRESENTS A PROGRAM AS A FUNCTION WITH 
MEMORY 

• MATHEMATICAL CONCEPT 

• SEQUENTIAL LOGIC 

• 'ARIABLES 


J. Gaffney 

IBM 

15 of 26 



ORIGINAL PAGE IS 
OF POOR QUALITY 


STAGES OF REFIN EMENT OF 
’ SOFTWARE DEFINITION 


REFINE DETAIL 
REFINE ESTIMATE 

I' 


REQUIREMENTS 


i 

DESIGN 

i 


‘INPUTS/OUTPUTS 
DESIGN LANGUAGE 


CODE 


SLOC 


J. Gaffney 

IBM 

16 of 26 

















ORIGINAL PAGE IS 
OF POOR QUALITY 


HALSTEAD SOFTWARE SCIENCE/LING'JISTICS 


MODEL OF A PROGRAM 


// 
No, OF 
Tokens 


N = 


Operand Vocabulary Size 

i?jL06i?2 + V - jLOC^o ~ K'ly 




Operator Vocabulary 
Size 


x 


No. of SLOC 


EXAMPLE: 


LA 


X 


Operator (op, code) 


Operand ^address 


) 


N = a . r/'2 log 


= B " t ] 2LOGt7*2 

7 ?*2 = No i of inputs/outputs at algorithm 

LEVEL 


J. Gaffney 

IBM 

18 of 26 



ORIGINAL PAGE IS 
OF POOR QUALITY 


STATE MACH INF MODEL. 


APPLITS TO PROGRAMS AT VARIOUS LF.VFLS OF ABSTRACTION 
OVf RA1 1. ► INPIVIPUA1 PROCEDURE 

applicable: at successive irvr.is of refinement 

BASED ON THE MEALY -MOORE MODEL OF SEQUENTIAL MACHINES 
DL.VT l.OPL'D 2 r > YEARS AGO 

MAPS GENI RALIZATION CP "INPUT" (PRESENT PLUS PAST) TP 
"OUTPUT" (PRESENT) 


J. Gaffney 

IBM 

19 of 26 



ORIGINAL PAGE IS 
OF POOR QUALITY 


State Machine Representation of a Program 



T = [(p. state, input); (n. state, output)) 


J. Gaffney 

IBM 

20 of 26 




ORIGINAL PAGE IS 
OF POOR QUALITY 




Levels of Specification 


J. Gaffney 

IBM 

21 of 26 









ORIGINAL PAGE IS 
OF POOR QUALITY 


FAN-OUT OF MACHINES 

AT SUCCESSIVE LEVELS OF REFINEMENT OF DETAIL 



A, AVERAGE 


J. Gaffney 

IBM 

22 of 26 




ORIGINAL ’ ; 
OF FOOR QUALITY 


ESTIMATION METHODOLOGY 


• THERE ARE THE SAME NUMBER OF LEVELS, REGARDLESS OF AMOUNT 
OF CODE 

• EARLIER ESTIMATES: 

• DECOMPOSE OVERALL REQUIREMENT INTO SUCCESSIVELY 
DETAILED STRUCTURE OF "BOXES" AT DIFFERENT "LEVELS" 

• COUNT NUMBER OF BOXES AT LOWEST "LEVFl " OF DETAILING, 
MULTIPLY BY "AVERAGE" NUMBER OF INSTRUCTIONS, 

• METHOD ANALOGOUS TO HARDWARE "FUNCTION" ESTIMATION 
BY BOX COUNT, THEN MULTIPLYING BY "AVERAGE" COST OF 
BOX. 


• LATER ESTIMATES: 

• COUNT NUMBER OF VARIABLES PER PROCEDURE 
1 APPLY FORMULA FOR EACH PROCEDURE TO GET SIZE ESTIMATE. 


J. Gaffney 
IBM 

23 of 26 


ORIGINAL PAG?* IS 
OF POOR QUALITY 


STATE MACHINE MODEL ESTIMATING FORMULAS 


FOR 


LEVEL NO. 

LEVEL NAME 

ESTIMATING FORMULA 
(ASSEMBLY CODE) 

4 

PROCEDURE 

6.354 Vlog^V (68) 

3 

MODULE 

l 

25.416 Vlog r V (273) 


2 INTEGRATION ;! (5460) 


WHERE: V * THE STATE MACHINE "VARIABLE COUNT" (AT THE PROCEDURE 

level); it corresponds to Halstead's ^ THE "operand" 

VOCABULARY SIZE, 


J. Gaffney 

IBM 

24 of 26 


ORIfW:L Wtte ^ 
OF POOR QUALITY 


DEGREE OF FIT OF ESTIMATING FORMULA 


RELATIVE ERROR 

l 

DEFINING SYSTEM 

VERIFICATION SYSTEM 

f 

i 

Overall 

-.0096 

-.0474 

Average, by Pro- 



cedure 

.027 

-.1056 

Standard Deviation 



by Procedure 
.. 

,564 

.8917 


J. Gaffney 

IBM 

25 of 26 








"THE CRUCIAL INGREDIENT OF SCIENCE. THIS IS 
THE HABIT OF MIND THAT LINKSCURIOSITY WITH 
DISCIPLINED, RIGOROUS, SUSTAINED INVESTIGATION 
TO EXPAND THE LIMITS OF KNOWLEDGE". 


WILLIAM K, STEVENS 
"THE NEW YORK TIMES" 

NOV, 9 , 1982 
PAGE C-l 


J- Gaffney 

IBM 

26 of 26 



A I PI NDANH 1 1 ST l)l*(i MBHR 1 . I ‘)82 


OR'OINAL PAGE’ IS 

OF POOR QUALITY 




,I 0 M *G n i* 

HUM 

PH^ljT in T\,1 r-A'MV 

» 0 ««.»T An,^l ( n 
rvcRe-nv AV^n.s 


a ® i *‘ t 0 rfs p APC u 

t.'V'" 

«J 3 C^VH; SPrfVTrF 

k c ’;; c, A°C w *. ^ A svs 

of i^n 

A® I M C S r A®r t » '’DOB 


ft ft) f ni\T|,ry 
. TiJ«* n’\0^cr>fit 1 P 

ft L r X A \ Pf ® nA®‘Vl p S 

® A M 0 V n a n i' i> 

VI'" n a <5 I f i 
.tiv.w R|.-o«v 
TLlSfcOH ^ts« 0® 
M I r H r L T tv 
CH r RYb »TfT l( |« , « 
ngOQPAV OijC H m -pav l s 
lAr K B ° 'V n 
n a v i n 

P A r, b WHij TH®n v 0 

frijftf® r - nwn^unPf' 

BI''hM r ' PR r 0FS<">iM 
n A T , f ^ C F M \) c |,1 ft ,<) 

A. 

FRFD HDj<?ST 

CYNTHIA BROWN 

VI |4 D 1 J W 1,1 

OQ M M Pil("K T 'A V D 
rrtam pj^gfr 
tom pijo JC 
.TuPEi °n 3 n R T 0 M 


C^C 

pw&s 


CSC 

0 M I V n F «n 
"•SA 

W ft;)ft/M(J 


C«C 

(Jpr^RftL ET.F r ’f®I'" 
OP.^Hftl, KT.EPT® T*" 


■•IP A 

h* T , fcj tr -| MTFRTAT, SIIPRORT 
(• c D p R p L JMDTCAL CC'.'JTpp 
l T f ® I 


HUf) 

C^C 

CTA 


OFFICE 


GSFC 


r:s fo 

Rtrio-ITO CQMSPJjT A V'C< 

l F X A S IWSTRMMFNTS 
m T X P t CORO 
U S u A 


.TJM ("AMMT^C VPI 

OA"fc C"R n r*c 

TQPN C * R r C^F 1 " 

T -L n Y n Cappfmtf» G^FC 

• TQMN C n R«'iM G U ’U 

J£ r F C u f. v G e C 

«?XFVF OmPUVROsjt C^c 

TiV'IS CH«11®A M P L 

VIC chpr^h C«C 

P A ,r L CM^'K^TS 'V®L 

CA°Y C n A T .$ MSA 

Tgn rorn® av p ; itt°i 

R . C r 'R T <i 7 IV ft S A / u Q 




ORIGINAL PAGfc* IS 
OF POOR QUALITY 

ROBERT i:«<ITC<SHA v K 

i G »1 

ray ro«JL r Y 

• ba 

«m, D» r c w 'p p 

CSC 

nyNCAA. !)^ v . w APt-' p lV<?KTr) 

IIS,)A 

CHARGE* ^ T ^ U w 

ncnn 

,TO m N ht i* (“K'u A m S 

liSpA 

^A v I n n IS^T\, 

BilREA'I 

vTI M niJQ4T\« 

l P ..i 

nE w N T S 0, "J* 

«r c - TP / : AERpSPAc®* company 

p A»»L D n l- RF» l''G p R 

ip'lV n f mp 

M I''KF*Y U ' ( M T H 'P 

VS A 

TU M n U *’ U 

l» S 

R A ,f E KrK«A w 0T 1R 

\)A$ A-f AW^T pY 

QETSV FO'-'J 15 .)* 

CSC 

'•'A T 'T p Q KTLTS 

I n >i 

° , u i ' f :r£ o\, 

GSFC 

wary ay\ esfav"! aqt 

r,Qfr 

SU P LCE M ^S’.r’C.p'R 

C SC 

FOR T A PT"r°F.n^F 

^ONTyf TW P 1) SYS 

W I T i L T A M c A Q Q 

VSrffC 

WARCTA FT,^ f ,o 

SY V S O p V C n R ,:> 

r A T H v FRA",* 

G c F p 

YlJRY F D K M K r r. 

esc 

RAW p R T (■- n M A i\i 

i ) M J V np ;,| n 

,T0«M GAF^N^Y 

M A T T . 1 M A r ■ vf A T u KR SK»VTCF 

RI^HARH .T, GAl,F 

T°I-TAC pPFTCF 

<RARY G"Q n 

^"RRO"GWS CHRP 

PATRICK p A 0 Y 

GSFC 

carl", riAMMn 

p r a f c c r c 

KET’rw GPL 

esc 

AM°IT (RfiPu 

SVRAC'LSF "MTV 

J0SF D H G n G fT E’W 

S°I 

A’A M C V r n n O v AW 

fJSFC 

A. J. ^RACF 

I°M 

ART 

CSC 

P D (?R E r MR IJRG 

iPl 

.tgr ^r^g^r 

MSA 

pick hamyl^uw 

hfl,t. r abora^'orifs 

UQWM HAS u WALT, 

R r vS p A R C H A RATA SVs 

C LT.E V MEORTMG 

gsf p 

PO"G HTLT.M p R 

CFYSilS BUREAU 

Tom MonGSQV 

m r t R E 

BARBARA «QT ^FS 

GSC 

ADRIOM H n i)X 

JPL 


RA V Ml)UG u T°N 

OF POOR QUALITY 

NATIONAL BUREAU OF STANDARDS 

HAM WQWAOJM 

USD A ' 

WIT.LTAW vumpnrfy 

DOTY 

nAVin mutc^ems 

IJN I V OF MO 

NQRMM* inELSHrt 

ITTRI 

ROM ^A«L C CKI 

O n D 

iT£M(\fV j A oq»IER 

GSF r 

TITjlTAM iamipson 

GSF<* 

nAvin jofstimg 

BFNDIX 

TH®IS JQM£R 

ITTPI 

ROBERT JMQ^K 

IBM 

DEWS KAFMRR 

VPI 

OWPN KAR^A'T’ZKE 

GSFO 

PET* K»T7. 

UN IV OF MO 

FRANCES KA^LAUSKT 

NAVOAr 

mi&a^nf KIMGSTOJi 

urda 

RE p NARO *' KT..ETN 

IPM/FRD 

RICH KMMK*L 

FORD AEPOSPACF 

-TOWN KMIffHT 

U»’IV OF VA 

RICHARD KNOX 

CRC 

JO »U KOGl'T 

research A OA^A SYS 

MA^CV framRR 

GRF r 

JEFF K"HM 

sasc 

WIRT* ».A*S 

I«M 

ROPFJPT L»RBQN 

U S D A 

«A»CV T.A f »BFN^HAL 

GRF^ 

KA»EW t.EAD^R 

ITTPI 

FRMJF 

G«C 

GEPTPUPE LFE 

DOTY AfiRrjr 

Raymond T.EPE«onenR 

GSF^ 

FAPL L p VTT* r 

SPI 

JAY Tj JFBF wTT? 

GVU 

ANWNV M A TOVE 

GSF r 

RE M RV VAT,E r 

ITT 

narefma vapohf 

GRF<" 

JERRV maosM 

I T T°I 

THOMAS MASTERS 

NR A 

J. E, wath f WR 

BF.NO I Y 

TOM VARTAN 

NR A 

ANM MAPir WCCAPE 

B"R»QOG«S CHRP 

W. L. wcroY 

faa 

PRANK M C r A RRY 

GRF r 

M A p Y ANN MTGARRY 

ITTRI 

JO^N MfpMEP 

f>PpT OF COMMFPCF 



ORIGINAL PAGE IS 
OF POOR QUALITY 


Fq MFDFIRQP 
pgr mefsom 
PH t L MFRWA°TW 
f>A^m MirHBET, 
TS«0 MTy AM^*T n 
KA°E W W0 p 
S, MOHftNTy 
JQWN M*>SA 


C'C 
C T A 
CPF C 

FF'EPT HATFRTAL PURPORT OFFICE 

U M »P 

f,PFC 

0P0 p X I^C 
BFLF LAPS 


MATTHEW MAOELM A N 
CH p I« WA°J"S 
BERNIE NARROW 
pop Mgr . son 
PQPEPT NTTCHMAM 
pqpf.pt NHqmam 


CPC 

MBA 

GSFC 

GSFC 

FT.E p T MRTFRTat, support OFFICE 
CPL f ECE QF w IT.t, T A m K, ma®Y 


OH ARLES nEBTERFITHFR 
*>A M L OWQPU* 

TOM nSTRANO 
THOMAS QffSTEPIOGF 


mtt»E COR° 

GvBFO . 

S°F 0 RY/nNTVAC 
U S S r C° E T SEOVTCF 


JERRY PAOE 
GERRY PARCPVFR 
RAYMONn PA"l 

tEONTE PFiyjWy 

WAITER PFN w Y 

farl petfrs 

JOHN PTETRAS 
MICHA^'j PJ j FT' t ’ 
PILL POSTHUMA 
JERRY PRFNTJCE 
DOUGLASS p»'T M A M 


C^C 

Hl *0 

NATTOMAr ^URKACF WEAPONS CE M 1'FR 

U<DA 

UPDA 

GPFO 

mttre 

CPC 

G<fO 

HMD 

QOA m T s / pr MOMT 


JIM PAMs«ry 

CHWAYA RAO 
GEORGE RAl'TE 

GERA^DTNF RI77.ARPI 
SAW REOWTNF 
PALLY RICHMOND 
DOW RO«BTNB 
MJMI RrtBFRTSOM 
JIM RQPIWSOM 
WILLTAM PO«lMSOfyJ 
JOHN ROC^ARO 
MICHAEL RO«L p D p R 
JORGF LUTS ROM p |j 
FYLE ROUF 
LESLIE RT'SHBROOK 
rqm rutlpdge 


U M IV np mo 
GFNFRAL ELECTRIC 
IJRDA 

A p DRC/S p S 

M T TPE 

CPC 

NS A 

I T T R I 

NBA 

SAC M S/FRE p MAN APS n C 
B?*RP 0 OGMS CORD 
CPC 
ITTRI 
I PH 

ipm/frd 

DOT/TPC 



ORIGINAL PAGE IS 
OF POOR QUALITY 


.TQWfo SARP 
PA^L S' , Me>'V> , P 
RU^fe) 0 PC w O T T=*iV 
T.fiF sc M U‘«Arn p R 
DITriRR^ Pfe-.TriV 
PA"L SPRftp'TM 
TEOg.<A S«K B, f« 
SYT.VTA i,Mc;PP' rt*' 
WA°rV A' f 

nA^i^ 

neMt s m i t h 
■ S* ; 1 T H 

VATHP><W «?MTXM 
■JEPRY 

ffb p M M S<mV)«-'R 
*V-l n T snc,TOVAV 
.'»Q U ‘M 5 r»s 
wiTc«E*,lj S«t p G p li 
JO«£PH STl'K^H* 

roped r s T EPH* r '^ 

T. STE V E*S 
T^ORY ?;rPA®’T p K 
RT P V P Rijn0TT u 

pa”l s*ut>: w s*i 


l $ r, f T WARF A K P 
MAR*"IM MftpipfTA 
AEPQSPACP 

MtpOfj c n K p 
U*Jp r HK 

c;rf^ 

«;P’N r RRL ME r T T, I r 

«|„Y. PQT YThTHMT^AT TUSTTT 

I n y 

W'* 

•j s ^ r 

rt AS^-M^GT EY 

G^UFftAb nV.v»MTC« 

C*C 

Y A b p M.>| T V 
G-«F" 

CnyT^T, TrjTf) svs 
00 Ai 

I^ASA/WQ 

*JR Of 

rjFlMPRftL. OV.-JA^TTR 
G^C 

0° APE” LAB 


KEJI TASAK1 
PG°EP i’ rAU p *^W T HF 

WAYaJF TftVLO K 

.1A M ER ^T 0 t ja, T p 
KE M N P T W ' P Q M 
PE^SY Tijo \jn 


GSFC 

J°L 

C«C 

\i*A 

API**C RFSFAPCH CORp 
C C C 


PA f ’L p T T E V ® M n n r M a M 
M. VTbARPO 
RU*A*» VQT^T 


l.lPlV) 

RFS P A»CH ft HA^A SYS 
^asa- t 'A”(;t>:v 


prmCF MA"'!> T NGT r ',\! 
.TA^K W^J p RTC« 
RH»RON WMiTGHRA 
PKRRf.fcff w A T i b A C r 
noLOOEP wat.TjACF 
OAORY W A Y S n iv 

»0 M WE p KR 

HAVin -.i£T.s« 

FL A I YE WFyrtKPR 
vxpgtnta wtlt.iapa 
PA T, L WTLT I« 
altcf r «0 M G 


WMRPf]MG»S C^R 0 

O n T 

C*C. 

RAYTHFiJM f t p V T C p <*0 

MATTo m PIJREA’J of STANffARDS 
IT TP I 

ono 

U.S. v A' r AL PRREARCH LAB 
C n UPA M T 1 WSTI W 
G S F F 

pObVTFCMNTCAL INSTITUTE 
OHT 


or l 


ORIGINAL PAGE IS 
OF POOR QUALITY 


AYfon^n ve« 

*.H&rtT,E* YQMM»N 
■’RRMTOTfiF VQtis.SECT 


uwiv nr m* 
cry EWT^RPRTS^S 
U W IV np mh 


SA"L ZAV p L p R 

M, ZFL«OWITZ 


A r D*C 

U^IV HF Mn 



BIBLIOGRAPHY OF SEL LITERATURE 


The technical papers, memorandums, and documents listed in 
this bibliography are organized into two groups. The first 
group is composed of documents issued by the Software Engi- 
neering Laboratory (SEL) during its research and development 
activities. The second group includes materials that were 
published elsewhere but pertain to SEL activities. 

SEL-Originated Document s 

SEL-76-001, Proceedings From the First Summer Software 
Engineering Workshop , August 1976 

SEL-77-001, The Software Engineering Laboratory , 

V. R. Basili, M. V. Zelkowitz, F. E. McGarry, et al., May 
1977 

SEL-77-002, Proceedings From the Second Summer Software 
Engineering Workshop , September 1977 

SEL-77-003, Structured FORTRAN Preprocessor (SFORT) , B. Chu 
and D. S. Wilson, September 1977 

SEL-77-004, GSFC NAVPAK Design Specifications Languages 
Study , P. A. Scheffer and C. E. Velez, October 1977 

SEL-78-001 , FORTRAN Static Source Code Analyzer (SAP ) 

Design and Module Descriptions , E. M. O’Neill, 

S. rT Waligora, and C. E. Goorevich, February 1978 

^SEL-78-002 , FORTRAN Static Source Code Analyzer (SAP) 

User’s Guide , E. M. O'Neill, S. R. Waligora, and 
C. E. Goorevich, February 1978 

SEL-78-102, FORTRAN Static Source Code Analyzer Program 
(SAP) User's Guide (Revision 1) , W. J. Decker and 

W. A. Taylor, September 1982 

SEL-78-003, Evaluation of Draper NAVPAK Software Desig n, 

K. Tasaki and F. E. McGarry, June 1978 


t 


This document superseded by revised document. 


B-l 





SEL-78-004 # Structured FORTRAN Preprocessor (SPORT) 

PDP-11/70 User Guid e# D. S. Wilson and B. Chu# Septembe r 

1971 

SEL-7G-005# P roceedings From the Third Summer Softw a re Engi - 
neering Workshop # September 1978 

SEL-78-006# GSFC Software Engineering Research Requirements 
Analysis Study / P. A. Scheffer and C. E. Velez# November 1978 

SEL-78-007# Applicability of the Rayleigh Curve to the SEL 
Environment , T. E. Mapp# December 1978 

SEL-79-001 # SIMPL-D Data Base Reference Manual # 

M. V. Zelkowitz# July 1979 

SEL-79-002 # The Software Engineering Laboratory. Rela- 
tionship Equations # K. Preburger and V. R. Basili# May 1979 

SEL-79-003# Common Software Module Repository (CSMR) System 
Description and User’s Guide # C. E. Goorevich# A. L. Gr*;<in# 
and S. R. Waligora# August 1979 

SEL-79-004# Evaluation of the Caine# Farber# and Gordon 
Program Design Language (PPL) in the Goddard Space Flight 
Center (GSFC) Code 580 Software Design Environment # 

C. E. Goorevich# A. L. Green# and W. J. Decker, Septembe r 
1979 

SEL-79-005# Proceedings From the Fourth Summer Software 
Engineering Workshop # November 1979 

SEL-80-001# Functional Requirements/Specifications for 
Code 580 Configuration Analysis Tool (CAT) # F. K. Banks# 

A. L. Green# and C. E. Goorevich, February 1980 

SEL-80-002 # Multi-Level Expression Design Language - 
Reguirement Level (MEDL-R) System Evaluation # W. j7 Decker 
and C. E. Goorevich# May 1980 

SEL-80-003# Multimission Modular Spacecraft Ground Suppor t 
Software System (MMS/GSSS) State-of -the-Art Computer 
Systems/Compatibility Study # T. Welden# M. McClellan# and 
P. Liebertz# May 1980 

SEL-80-004# System Description and User’s Guide for Code 580 
Configuration Analysis Tool (CAT) , F. K. Banks# 

W. J. Decker, J# G. Garrahan, et al., October 1980 

SEL-80-005# A Study of the Musa Reliability Model # 

A. M. Miller, November 1980 


B-2 







SEL-80-006 , Proceedings From the Fifth Annual Software 
Engineering Workshop # November 1966 

SEL-80-007, An Appraisal of Selected Cost/Resource Estimation 
Models for Software Systems # jT F. Cook and FT E. McGarryT 
December 1980 

+ SEL-81-001, Guide to Data Collection , V. E. Church, 

D. N. Card, F. E. McGarry, et al., September 1981 

SEL-81-101, Guide to Data Collection , V. E. Church, 

D. N. Card, F. E. McGarry, et al., August 1982 

S EL-81-002 , Software Engineering Laboratory (SEL) Data Ras e 
Organization"~and User's Guide , D. C. Wyckoff, G. Page, and 
F. E. McGarry, September 1$81 

SEL-81-003 , Software Engineering Laboratory (SEL) Data Bas e 
Maintenance System (DBAM) User*s Guide and System De- 
scription, D. n 7 Card, D. C. Wyckoff, and G. Page, September 
1981 

+ SEL-81-004, The Software Engineering Laboratory , 

D. N. Card, F. E. McGarry, G. Page, et al., September 1981 

SEL-81-104, The Software Engineering Laboratory , D. N. Card, 
F. E. McGarry, G. Page, et al., February 1982 

^SEL-81-005, Standard Approach to Software Development , 

V. E. Church, F. E. McGarry, G. Page, et al., September 1981 

SEL-81-105, Recommended Approach to Software Development , 

S. Eslinger, F. E. McGarry, and G. Page, May 1983 

SEL-81-006 , Software Engineering Laboratory (SEL) Documen t 
Library (DOCLIB) System Description and User's GuifleT 

W. Taylor and W. J. Decker, December 198 I 

^SEL-81-007 , Software Engineering Laboratory (SEL) Com- 
pendium of Tools , W. J. Decker, E. j. Smith, A. L. Green, 
et al., February 1981 

SEL-81-107, Software Engineering Laboratory (SEL) Compendium 
of Tools , W." j, Decker, W. A. Taylor, and E. J. Smith, 
February 1982 


This document superseded by revised document. 






SEL-81-008, Cost and Reliability Estimation Models (CAREM ) 
User's Guide , J. P. Cook and E. Edwards, February 1981 

SEL-81-009, Software Engineering Laboratory Programmer 
Workbench PhaFe 1 Evaluation * W. J. Decker and 
P. E. McGarry, March i"fsl 

SEL-81-010 , Performance and Evaluation of an Independent 
Software Verification and Integration Process , G. Page and 
P. E. McGarry, Nay 1^81 

SEL-81-011 , Evaluating Software Devt i opment by Analysis o f 
Change Data * D. M. Weiss# November t981 

SEL-81-012, The Rayleigh Curve As a Model for Effort 
Diatr ibution‘~Over the Life of Medium Scale Software Systems/ 
G. 0. Picasso, December 1981 ~ 

S EL-81-013, Proceedings From t h e Sixth Annual Software Engi - 
n eering Workshop , December 1981 

SEL-81-014 , Automated Collection of Software Engineering 
Data in the Software Engineering Laboratory (SEL ), 

A. L. Green, W. J. Decker, and F. E. McGarry, September 1981 

S EL-82-001, Evaluation of Management Measures of Software 
Development , G, Page, D. N. Card, and F. E. McGarry, 
September 1982, vols. 1 and 2 

SEL-82-002, FORTRAN Static Source Code Analyzer Program 
(SAP) System Description , W. A. Taylor and W. J. Decker, 
August 1982 

S EL-82-003 , Software Engineering Laboratory (SEL) Data Bas e 
Reporting Software User T s Guide and System Description , 

P. Lo, September 1982 

SEL-82-004, Collected Software Engineering Papers: 

Volume 1 , July 1982 

SEL-82-005, Glossary of Software Engineering Laboratory 
Terms , M. G. Rohleder, December 1982 

S EL-82-006 , Annotated Bibliography of Software Engineering 
Laboratory (SEL) Literature , D. N. Card, November 1982 

SEL-82-007, Proceedings From the Seventh Annual Software 
Engineering Workshop , December 1982 

S EL-82-008 , Evaluating Software Development by Analysis of 
Changes: The Data From the Software Engineering Laborator y, 

V. R. Basili and D. M, Weiss, December 1982 


B-4 









r» el- R e la ted Literatur e 

* Bailey, J. W. , and v.. R. Basil!, M A Meta-Model for Soft- 
ware Development Resource Expenditures," Proceedings of 
the Fifth International Conference on Software Engineerin g, 
tfew Yorks Computer Societies Press, 1981 

Banks, F. K., "Configuration Analysis Tool (CAT) Design," 
Computer Sciences Corporation, Technical Memorandum, March 
1980 

L *. 

ri Basili, V. R. , "Models and Metrics for Software Management 
and Engineering," ASME Advances in Computer Technology , 
January 1980, vol. 1 

Basil!, V. R. , "SEL Relationships for Programming Measure- 
ment and Estimation," University of Maryland, Technical 
Memorandum, October 1979 

Basili, V. R. , Tutorial on Models and Metrics for Softwar e 
Management and Engineering . New York: Computer Societies 

Press, 1980 (also designated SEL-80-008) 

+ ■}• 

Basili, V. R. , and J. Beane, "Can the Parr Curve Help With 
Manpower Distribution and Resource Estimation Problems?", 
Journal of Systems and Software , February 1981, vol. 2, 
no. 1 

I**!* 

T Basili, V. R. , and K. Freburger, "Programming Measurement 
and Estimation in the Software Engineering Laboratory," 
Journal of Systems and Software , February 1981, vol. 2, 
no. 1 

Basili, V. R. , and B. T. Perricone, Software Errors and 
Complexity: An Empirical Investigation , University of” 

Maryland, Technical Report TR-1195, August 1982 

X X 

T Basili, V. R., and T. Phillips, "Evaluating and Comparing 
Software Metrics in the Software Engineering Laboratory," 
Proceedings of the ACM SIGMETRICS Symposium/Workshop: 

Quality Metrics , March 1981 


XX 

TT This article also appears in SEL-82-004, Collected Softwar e 
Engineering Papers: Volume 1 . July 1982. 


B-5 


Basili, V. R,, R. W. Selby, and T. Phillips, Metric Analysis 
and Data Validation Across FORTRAN Projects , University o£ 
Maryland, Technical Report, November 1982 

Basili, V. R., and R. Reiter, "Evaluating Automatable Meas- 
ures for Software Development," Proceedings of the Workshop 
on Quantitative Software Models for Reliability, Complexity 
anQost rTScIoFeT'^ ^ 

Basili, V.R., and D. M. Weiss, A Methodology for Collectin g 
Valid Software Engineering Data * University of Maryland , 
Technical Report TR-li 35 , December 1982 

Basili, V. R. , and M. v. Zelkowitz, "Designing a Software 
Measurement Experiment," Proceedings of the Software Life 
Cycle Management Workshop , September 1977 

^Basili, V. R. , and M. V. Zelkowitz, "Operation of the Soft- 
ware Engineering Laboratory," Proceedings of the Second 
Software Life Cycle Management Workshop , August 1978 

^Basili, V. R. , and M. V. Zelkowitz, "Measuring Software 
Development Characteristics in the Local Environment," 
Computers and Structure s, August 1978, vol. 10 

Basili, V. R. , and M. V. Zelkowitz, "Analyzing Medium Scale 
Software Development," Pr oceedings of the Third Interna- 
tional Conference on Software Engineering . New York: 

Computer Societies Press, 1978 

^Basili, V. R. , and M. V. Zelkowitz, "The Software 
Engineering Laboratory: Objectives," Proceedings of the 

Fifteenth Annual Conference on Computer Personnel Researc h, 
August 1977 

Card, D. N., "Early Estimation of Resource Expenditures and 
Program Size," Computer Sciences Corporation, Technical 
Memorandum, June 1982 

Card, D. N., "Comparison of Regression Modeling Techniques 
for Resource Estimation," Computer Sciences Corporation, 
Technical Memorandum, November 1982 

Card, D. N., and M. G. Rohleder, "Report of Data Expansion 
Efforts," Computer Sciences Corporation, Technical Memo- 
randum, September 1982 


ft 

This article also appears in SEL-82-004, Collected Software 
Engineering Papers: Volume 1 , July 1982. 


B-6 


+t Chen, E., and M. v. Zelkowitz, "Use of Cluster Analysis To 
Evaluate Software Engineering Methodologies," Proceedings 
of the Fifth International Conference on Software Engineer- 
ing . New York: Computer Societies Press, 1^61 

Freburger, K., "A Model of the Software Life Cycle" (paper 
prepared for the University of Maryland, December 1978) 

Higher Order Software, Inc., TR-9, A Demonstration of AXES 
for NAVPAK , M. Hamilton and S. Zeldin, September 1977 (also 
designated SEL-77-005) 

Hislop, G., "Some Tests of Halstead Measures" (paper pre- 
pared for the University of Maryland, December 1978) 

Lange, S. F. , "A Child’s Garden of Complexity Measures" 

(paper prepared for the University of Maryland, December 
1978 ) 

Miller, A. M. , "A Survey of Several Reliability Models" 

(paper prepared for the University of Maryland, December 
1978 ) 

National Aeronautics and Space Administration (NASA) , NASA 
Software Research Technology Workshop (proceedings) , March 
1980 

Page, G., "Software Engineering Course Evaluation," Computer 
Sciences Corporation, Technical Memorandum, December 1977 

Parr, F. , and D. Weiss, "Concepts Used in the Change Report 
Form," NASA, Goddard Space Flight Center, Technical Memoran- 
dum, May 1978 

Reiter, R. W. , "The Nature, Organization, Measurement, and 
Management of Software Complexity" (paper prepared for the 
University of Maryland, December : 976) 

Scheffer, P. A., and C. E. Velez, "GSFC NAVPAK Design Higher 
Order Languages Study: Addendum," Martin Marietta Corpora- 

tion, Technical Memorandum, September 1977 

Turner, C. , and G. Caron, A Comparison of RADC and NASA/SEL 
Software Development Data , Data and Analysis Center for 
Software, Special Publication, May 1981 


^This article also appears in SEL-82-004, Collected software 
Engineering Papers: Volume 1 , July 1982. 


B-7 




Turner, C., G. Caron, and G. Brement, NASA/SEL Data Compen - 
dium , Data and Analysis Center for Software, Special Publi- 
cation, April 1981 

Weiss, D. M. , "Error and Change Analysis," Naval Research 
Laboratory, Technical Memorandum, December 1977 

Williamson, I. M., "Resource Model Testing and Information," 
Naval Research Laboratory, Technical Memorandum, July 1979 

^Zelkowitz, M. V. , "Resource Estimation for Medium Scale 
Software Projects," Proceedings of the Twelfth Conference on 
the Interface of Statistics and Computer Science * New York: 
Computer Societies Press, 1979 

Zelkowitz, M. V., "Datr. Collection and Evaluation for Ex- 
perimental Computer Science Research," Empirical Foundations 
for Computer and Information Science (proceedings) , November 
1952 

Zelkowitz, M. V. , and V. R. Basili, "Operational Aspects of 
a Software Measurement Facility," Proceedings of the Soft- 
ware Life Cycle Management Workshop , September 1977 


++ 

This article also appears in SEL-82-004, Collected Software 
Engineering Papers; Volume 1 , July 1982. 

B-8 


«U,S, GOVERNMENT PRINTING OFFICE: 1983-38 1-79 1:^67 



